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PSYCHOLOGY IN JAPAN 
KOJI SATO anv C. H. GRAHAM 
Kyoto University and Columbia University 


This paper presents a survey of 
Japanese psychology, first, from a 
historical point of view, and second, 
from the point of view of describing 
some of its modern aspects. The occa- 
sion for organizing the paper arose in 
the summer of 1952 when the two 
authors were members of the Seminar 
in Experimental Psychology, one of 
the Kyoto Seminars in American 
Studies conducted under the auspices 
of Kyoto University, Doshisha Uni- 
versity, the University of Illinois, and 
the Rockefeller Foundation. Many of 
the ideas expressed and information 
contained in the account are products 
of the many discussions that went on 
among the Seminar participants 
along with the formal academic ac- 
tivities.! 

The present survey attempts, with- 
in its limitations, to trace, in general 
outline and without claim to ex- 
haustiveness, certain historical pat- 
terns of Japanese psychology and 
their emergence into the modern 
stream of fact, theory, and practice. 
The account enumerates many re- 
searches that have been carried out 


by Japanese investigators, researches 


that are not usually well known in 
the West. It is hoped that the pres- 
ent description will stimulate inter- 
est in these contributions. 


1 We are greatly indebted to Mrs. Harouye 
Fukuhara of Doshisha University, Kyoto, for 
her generous help in accumulating materials 
on which this report is based. 


HistorRY OF PSYCHOLOGY IN JAPAN 
Psychology until 1926 


Western psychology was _ intro- 
duced in Japan about 10 years after 
the beginning of the Meiji Era (1868— 
1912) against an ancient background 
of Indian, Buddhist, and Chinese: 
philosophy. Once the introduction 
was”7 effected, translations were 
quickly made of books by Haven, 
Bain, Sully, Wundt, and Ladd. The 
first experimental researches of Japa- 
nese psychologists were performed in 
America and Germany in the last 
decade of the nineteenth century by 
such men as Motora (111), Matsu- 
moto (87, 88, 89, 90, 91), Nakajima 
(116), T. Okabe (142), and Kakise 
(43). 

Yujiro Motora (1858-1912) and 
Matataro Matsumoto (1865-1943) 
were the principal pioneers of Japa- 
nese psychology. Motora was the 
first professor of psychology in Tokyo 
University. He had a_ philosophic 
mind and tried to establish a system 
of psychology. Matsumoto was pri- 
marily an experimentalist. He de- 
signed the psychological laboratory 
of Tokyo University in 1903, after 
inspecting laboratories in 
Germany, 


America 
established a 
laboratory at Kyoto University in 
1907. He hecame the professor of 
psychology at Kyoto in 1906 and 
Motora at Tokyo after 
the latter’s death in 1913. 


and and 


succeeded 


He advo- 
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cated the study of “psychocine- 
matics,’’ a science of psychophysio- 
logical behavior, and became the 
father of applied experimental psy- 
chology in Japan. He_= survived 
Motora by 31 years. Most of the 
older leaders of Japanese psychology 
are his students. 

The first 25 years of the twentieth 
century constituted an orientation 
period in the methods of psychologi- 
cal research. Before 1925 studies ap- 
peared under the names of such re- 
searchers as Chiba (16, 17), K. Ma- 
suda (83, 84, 85), Yh. Kubo (64, 65, 
66), K. Tanaka (183, 184, 185), G. 
Kuroda (70, 71), Sakuma (154, 155, 
156, 157), Koga (58), Chiwa (18), and 
Narasaki (119). In 1911, Ohtsuki 
(138) published his Experimental 
Psychology, a book of 1,000 pages, 
which owes its material to Wundt, 
Titchener, and other German and 
American psychologists. The preva- 
lent schools of psychology in Japan 
at this time were Wundt’s appercep- 
tion psychology and American func- 
tionalism. Students of structuralism, 
such as Yokoyama (218, 219) and 
Takagi (175), came at the end of this 
period, as did Yatabe, who had just 
returned from Piéron’s laboratory. 

Experimental studies had started 
in the psychological laboratories of 
Tokyo and Kyoto universities near 
the beginning of the century. These 
works were first published as mono- 
graphs, and it was not until the Japa- 
nese Journal of Psychology was 
founded in 1919 at Kyoto by Genji 
Kuroda that a ready means of publi- 
cation became available. The edi- 
torial office was moved to Tokyo in 
1923. 

With the advent of better economic 
conditions after World War I the 
number of educational institutions in- 
creased greatly. Departments of psy- 
chology were established in Tohoku 
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(Sendai), Kyushu (Fukuoka), Keijo 
(Seoul, Korea), Taihoku (Taipei, 
Formosa), Nihon, Hosei, Waseda, 
Keio (the latter four in Tokyo), 
Doshisha (Kyoto), Kwansei Gakuin 
University (Kobe), and the Tokyo 
and Hiroshima Colleges of Arts and 
Science. In addition, many psycholo- 
gists obtained positions in the Koto- 
gakko (essentially junior colleges). 

It is not surprising that the increase 
in Japanese educational facilities was 
accompanied by a quickened interest 
in educational psychology, an inter- 
est that was becoming strongly estab- 
lished by World War I. The mental 
test movement found a home in Ja- 
pan, but many of its technical aspects 
were not developed along the lines 
that prevailed in America. 

Studies of personality did not be- 
come important in number until the 
1930's, but it is important to note 
that Watanabe wrote a book on per- 
sonality (202) as early as 1912. In- 
dustrial psychology was recognized 
by some people as an important field 
of research during World War I, but, 
probably due to problems peculiar to 
Japan because of overpopulation, it 
did not develop (nor has it yet de- 
veloped) to the same important posi- 
tion it holds in the United States. 
In clinical psychology, the psychi- 
atric institutes of Tokyo and Kyoto 
universities participated in the men- 
tal test movement and devised psy- 
chodiagnostic methods, but only a 
few individual psychologists, e.g., 
Oguma (133), took an interest in 
problems of abnormal psychology. 
Kuwata (75) of Tokyo University 
introduced the folk psychology of 
Wundt, and Iritani (36) the social 
psychology of McDougall. 

In general, it may be said that de- 
spite the existence of certain scat- 
tered interests in other areas, psychol- 
ogy in Japan until 1926 was largely 
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dominated by educational psychology 
and classical experimental psychol- 
ogy. 

One major contribution of great in- 
fluence that was to have an impor- 
tant bearing on later developments in 
Japanese psychology appeared in 
1925. In this year Matsumoto wrote 
his comprehensive Psychology of Intel- 
ligence (90), a book that dealt not 
only with intelligence, but also with 
mental work, environ- 
mental factors, efficiency, and mili- 
tary psychology. In addition it in- 
cluded a broad commentary on the 
general development of psychology. 
It involved a survey of research done 
by Japanese psychologists under his 
(Matsumoto’s) guidance as well as a 
consideration of related work done by 
western psychologists. It was a mile- 
stone in the development of Japanese 
applied psychology. 


senescence, 


From 1926 through World War II 


The beginning of the Showa Era 
(1926) marked the development of 
some important trends in Japanese 
psychology. At that time the influ- 
ence of gestalt psychology was be- 
ginning to be felt; the Japanese 
Psychological Association was es- 
tablished, as was the new series of the 
Japanese Journal of Psychology; and 
the number of graduates in psychol- 
ogy showed a remarkable increase 
due to the expanded system of higher 
education that developed after World 
War I. 

Sakuma (157) and Onoshima (146), 
who studied at Berlin University, 
were the chief expositors of gestalt 
psychology. Sakuma’s translation of 
Kohler’s Gestalt Psychology (154) is 
well known. It can safely be said 
that gestalt theory has colored the 
thought of most Japanese psycholo- 
gists from about 1926 to World War 
II (and even now exerts a powerful 
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The years around 1926 
the introduction of the 
works of the Leipzig school (e.g., 
those of Krueger and Volkelt) by 
Iwai (40, 41) and others (159) of 
Kyoto University. It would, how- 
ever, be wrong to think that these in- 
fluences were unaccompanied at this 
time by original Japanese contribu- 
tions. Chiba proposed a theory of 
‘*Eigenbewusstsein,’’ R. Kuroda the 
psychology of ‘‘comprehension,” and 
Sakuma a theory of “basic conscious- 
ness’ as directly experienced. These 
contributions may be seen as an at- 
tempt to revise the traditional theory 
of consciousness in the light of some 
characteristics of oriental culture. 
Among books that appeared near 
1926, several deserve particular men- 
tion. K. Masuda’s Introduction to 
Experimental Psychology, a manual 
for beginners, appeared in 1926. The 
same author’s Methodology of Psychol- 
ogy, published in 1934, is an author- 
itative work in general psychology. 
Yh. Kubo compiled two volumes of 
his important ITandbook of Experi- 
mental Psychology during 1926 and 
1927, and Kido’s General Outline of 
Psychology appeared in 1931. 


influence). 


also saw 


Experimental Psychology 


What were some of the evident 
trends in experimental psychology 
from 1926 to World War II? 

Perception. In the years between 
1926 and 1939, the study of percep- 
tion progressed at a quickened pace, 
usually along lines dictated by gestalt 
psychology. A short enumeration 
of some of the studies undertaken 
may be used to indicate the areas of 
greatest interest. 

The Psychological Institute of Kyu- 
shu University published a series of 
experimental studies on the structure 
of perceptual space directed by 
Sakuma, Yatabe, and Akishige (8, 9, 
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95, 158, 182, 206). Takagi (176) 


compared the structural, phenome- 
nological, and gestalt views of per- 
ception. Obonai (30, 121, 125, 126, 
127) continued (and has to the pres- 
ent time) his studies of 
physical induction in perception, 
memory, and other areas. (Most of 
the contributions are in the area of 
visual perception, where Obonai has 
tried to formulate laws of induction 
as extensions of the laws of contrast 
and confluence.) 

Matsubayashi (86), an ophthal- 
mologist, performed his classical ex- 
periments on depth in 1937 and 1938. 
A number of investigations of the 
visual constancies were carried out. 
Akishige (8), Ibukiyama (27), and 
others worked on size constancy; 


psycho- 


Yt. Kubo (67) on shape constancy; 
and Ogasawara (130) on the con- 
stancy of phenomenal velocity. Nishi 
(121) studied the moon illusion. 
Studies on the perception of move- 
ment may be represented by the in- 


vestigations of Mukuno (113) on 
tactile apparent movement, S. Kubo 
(63) on visual apparent movement, 
Hisata (27) on auditory apparent 
movement, Fukutomi (21) on delta 
movement, and Ogawa (131) on the 
seen path of real movement. Studies 
on form perception were carried out 
by Morinaga (102), Hayami and 
Miya (25), Yagi (203), and others. 
Studies of the relation between time 
and space (the ‘‘tau effect” and its 
reverse) were performed by Abbe 
(1, 2) and by Suto (172). 

At this time Y. Wada (200) per- 
formed experiments on the time error 
for auditory stimuli. Other studies 
in audition were not extensive, but 
Yuki made an important contribu- 
tion when he published A Psychology 
of Tone (221) in 1933. In the field of 
gustatory sensitivity, Rikimaru (151) 
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made a survey of taste blindness 
among Japanese, Chinese, and For- 
mosan natives. 

The area of eidetic imagery was 
investigated by Ohwaki and Masaki 
(80) and the former (139) published 
his hook Eidetic Imagery of Children 
in 1935. 

The work in perception during this 
period was essentially the product of 
a relatively small number of Japanese 
psychologists. These psychologists 
were very active and contributed a 
number of results which are, unfor- 
tunately, not as well known in west- 
ern countries as they should be. 

Human learning, animal learning, 
and thinking. ‘The field of human 
learning was not so intensively in- 
vestigated from 1926 to 1939 as was 
the area of perception. The studies 
that were done were generally per- 
formed within the framework of 
gestalt psychology. For example, 
Ushijima (197) studied the achieve- 
ment problem (£rfolgsproblem) in 
learning, Amano (11) investigated 
‘‘memory traces” in the tradition of 
the Wulf experiment, and problems 
of retroactive inhibition, proactive 
inhibition, and reproductive inhibi- 
tion were taken up by Sagara (152, 
153) and Maeda (76). 

Researches in the field of animal 
behavior were sparse, but a few in- 
teresting investigations were under- 
taken. R. Kuroda published studies 
on the hearing of reptiles as early as 
1923. Thereafter he extended his re- 
searches to the monkey, white rat, 
and tortoise. He wrote a Psychology 
of Animals (74) in 1936. ‘Takagi 
(178) studied the influence of back- 
ground upon the transposition of se- 
lective responses to brightness (in the 
varied tit) and investigated form 
constancy in the tomtit (177). The 
transposition of selective responses 
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was also studied by Ohtsuka (137) 
and Shirai (164). Odani (128), a psy- 
chologist and psychiatrist, studied 
the effects of cerebral lesions upon 
the visual and auditory discrimina- 
tion habits of white rats. Yamanou- 
chi, a biologist, published his Psy- 
chology of Animals in 1938. Hayashi 
(26), a student of Pavlov, pioneered 
in the study of conditioning. 

The general problem of thinking 
has been of great interest to Japanese 
psychologists. Kido (69), Mitui (96), 
and Tomeoka (189) of Hosei Uni- 
versity performed experimental ob- 
servations on the process of doubt 
and speculation. Sato (160), Abe 
(5), and Amano (12) analyzed the 
process of comparison by introspec- 
tive methods. Sato (160) has extended 
his interest in this general area to 
investigations of developmental as- 
pects of the comparison process. 


Personality 


In the thirties German character- 
ology was introduced into Japan, 
and some experiments (10) were per- 
formed within the framework of this 
viewpoint. These experiments con- 
stituted the first systematic work in 
the field of personality. For example, 
psychologists (7, 187) at Waseda 
University performed research in the 
context of Kretschmer’s typology, 
and Uchida (191), in particular, de- 
vised a continuous addition test, es- 
sentially a modification of Kraepel- 
in’s work. Uchida’s test is now in 
general use in Japan (190). During 
this period, Abe (6), Susukita (171), 
and others, under the guidance of 
Chiba, studied personality differences 
among Manchurian races. Masaki 
and Yoda (82) studied personality 
from the viewpoint of educational 
psychology and wrote A Psycholegy 
of Character in 1937. 
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Educational and Clinical Psychology 

Educational psychology, one of the 
two oldest fields of psychological in- 
vestigation in Japan, showed consid- 
erable progress between 1926 and 
World War II. Studies of develop- 
ment, learning, personality, work, and 
fatigue increased greatly in number 
and had direct influences on educa- 
tion. 

The Institute for Child Welfare of 
Aiikukai and the Institute for Child 
Study in Nihon Women’s University 
played particularly important roles in 
researches in child psychology during 
the thirties. Tokyo (with Tanaka, 
Narasaki, Terazawa, and Takemasa 
{120, 181, 183]) and Hiroshima College 
of Arts and Science (with Kubo and 
Koga [57, 64, 66]) were centers of edu- 
cational psychology. Kyoto Uni- 
versity, with Nogami (123) in ado- 
lescent psychology, and Iwai, Kato, 
Sonohara, and Moriya in child psy- 


chology, was a center of develop- 
mental psychology. At this time Yh. 
Kubo (66) wrote his Child Psychology 


the influence of Charlotte 
Bihler, and Hatano introduced Pia- 
get’s work (24) to Japan. The main 
weakness of the child studies carried 
out in this period lay in the fact that 
no provisions were made for large- 
scale follow-up investigations. Inves- 
tigations in the areas of maturity and 
old age were done almost solely by 
Tachibana (174). 

The investigation of children’s per- 
sonalities was taken up in the period 
from 1920-1935. Intelligence tests 
and personality tests were not elab- 
orately developed or used in Japan 
until after Ohtomo, a student of 
Judd, wrote his two-volume Diag- 
nostics of Education (136) in the 
period 1928-1933. A revision of the 
ikinet-Terman tests of intelligence by 
ti. Suzuki (173), then a school in- 


under 
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spector of Osaka City, has been in 
general use in Japan since 1930, and 
group intelligence tests were also 
adapted for Japanese use by Tanaka 
(185), Kirihara (53), Awaji (13), and 
others. The only personality tests 
developed during this period were a 
test of extroversion-introversion by 
Awaji and Okabe (14), and a revision 
of the Downey Will-Temperament 
test by Kirihara. 

In the 1930's psychologists, child 
psychologists, and  educationists 
formed groups to study clinical and 
abnormal child psychology. Their 
studies mark the beginning of clinical 
psychology in Japan. Several child 
guidance clinics were established 
about 1930 (in Kyoto City, Kobe 
City, Hyogo Prefecture, Aichi Pre- 
fecture, and Tokyo City). At the 
same time a few clinics were estab- 
lished in prisons, courts, and reforma- 
tories under the Japanese Judicial 
Department and in guidance centers 
set up by the Welfare Department. 
The term clinical psychology was 
never used officially during this pe- 
riod, and the work of the few psy- 
chologists employed in the clinics was 
limited to diagnosis and the evalua- 
tion of intelligence and character. 
The work of the early clinicians and 
the work of the educational psychol- 
ogists overlapped, but their respec- 
tive efforts differed in one respect: the 
former extended their methods to 
meet the requirement of evaluating 
abnormal children and criminals. In 
a few cases the clinicians adopted 
methods typical of psychoanalysis 


(Marui [79] and others). 


Industrial Psychology 


This field, always at a disadvan- 
tage in Japan because of the ever- 
present overpopulation and its con- 
sequence, cheap labor, did not pro- 
gress as rapidly during the 1930's 
as some other areas of psychology. It 
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has been mentioned that, during 
World War I, attention was paid to 
industrial psychology and some sys- 
tematic studies of the labor problems 
were made subsequent to 1926 by 
members of the Kurashiki Institute 
of the Science of Labor, founded in 
1921 by M. Ohara. (The Institute 
moved to Tokyo in 1937 and has con- 
tinuc' its work until the present 
time. It has included several psy- 
chologists on its staff, Kirihara and 
Ueno [52, 192], for example.) The 
Prefectural Institute of Industrial 
Psychology of Osaka was founded 
about 1925. By the middle of the 
thirties the Ministry of Welfare and 
the Government Railways employed 
a number of psychologists for voca- 
tional guidance and the promotion 
of efficiency of labor. There have 
been two journal publications in this 
field: The Science of Labor and Man- 
agement. Wuring the war human re- 
sources were in short supply, and 
many books on the management of 
labor appeared. considerable num- 
ber of psychologists entered the 
Army and the Navy and took part 
in researches on aptitude testing and 
personnel problems. 


Social Psychology 


Wundt’s folk psychology and Mc- 
Dougall’s psychology were 
studied at an early date by Japanese 
psychologists, but research in the 
social area did not develop for a long 
time. The measurement of attitude 
was first undertaken by Koga (56) 
about 1934, and problems of race 
differences (133) and group psychol- 
ogy (46, 47, 97) interested psychol- 
ogists from about 1931 to World 
War II. During this period, prob- 
lems of national morale were seriously 
taken up by only a few psychologists, 
due, probably, to the fact that in a 
Japan dominated by the military 
caste, such questions were considered 


social 
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to be political or philosophical in 
nature rather than psychological. 


General Observations 


During the period 1926 to World 
War II, Japanese psychologists were 
self-conscious and self-critical. Their 
science was young, and they were 
passing through a period of self-ex- 
amination and, to some degree, of 
confusion. Matsumoto, president of 
the Japanese Psychological Associa- 
tion in 1933, discussed (91) what he 
considered to be the three weak 
points of Japanese psychology: (a) 
lack of communication among psy- 
chologists, resulting in a decreased ef- 
fectiveness of research; (b) excessive 
use of group methods involving 
questionnaires, tests, etc., and the 
paucity of analytic experimental re- 
searches; (c) excessive speed in adopt- 
ing new, and probably transient, 
western psychological concepts, with- 


out, in all cases, applying appropriate 


historical criticisms and evaluation 
of facts. 

The years near 1935 were the high 
watermark of Japanese psychology 
before World War I]. The number of 
psychological journals was at a peak 
and psychologists were very active. 
The Tohoku Psychologica Folia (Sen- 
dai), the Japanese Journal of Experi- 
mental Psychology (Kyoto), and the 
Acta Psychologica Keijo (Seoul) be- 
came important scientific journals. 
The Japanese Journal of Educational 
Psychology (Tokyo), the Japanese 
Journal of Applied Psychology (Hiro- 
shima), and Animal Psyche (Tokyo), 
featuring articles in a somewhat 
popularized style, signaled the emer- 
gence and development of new and 
important areas. Two journals in 
fields related to psychology, Studies 
in the Science of Labor and Beitrdge 
zur Psychoanalyse (Sendai) became 
influential. According to a survey by 
Nakamura and Nagasawa, about 100 
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Japanese works were abstracted in 
the Psychological Abstracts between 
1933 and 1940. 

The number of reports given at the 
fourth meeting of the Japanese Psy- 
chological Association at Sendai in 
1933 was 8&4; at the fifth meeting at 
Tokyo in 1935, 126; and at the sixth 
meeting at Keijo, Korea, in 1937, 63. 
At these meetings, strictly experi- 
mental work constituted about 40 
per cent of the papers. Sensation and 
perception were the major interests, 
about one in four papers being in 
these areas. 

Psychological research in Japan 
was severely reduced at the time of 
the outbreak of the Sino-Japanese 
conflict in 1937 and its later develop- 
ment into the Pacific war in 1941, 
Several journals could not be con- 
tinued, and at the end of the war even 
the Japanese Journal of Psychology 
could not be published regularly. 
During the long period of strife a 
number of psychologists worked in 
the Navy and the Army, but, in 
general, it may be said that psycho- 
logical research was almost com- 
pletely interrupted. The number of 
students in psychology 
dwindled to essentially zero. 


graduate 


From the End of World War IT to the 
Present 


After World War II a great num- 
ber of changes took place in Japanese 
life, many of them attributable to 
the Occupation. Among other things, 
the changes include a new and liberal- 
ized Constitution and a_ proposed 
renovation of the Japanese educa- 
tional system. We cannot consider 
here all the factors behind the influx 
of students into universities and the 
influence exerted by the ‘‘new” edu- 
cation. It is sufficient to say that the 
number of graduate students in psy- 
chology increased considerably above 
the prewar level, and there was 
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heightened interest in psychology, 
particularly in areas that had not 
been well developed previously. At 
the same time, the Japanese Psycho- 
logical Association greatly increased 
its membership, and its meetings 
were held annually. During the 1951, 
1952, and 1953 meetings, the number 
of reports presented exceeded 300. 
The percentage of the reports that 
could be termed experimental was 
about 30. 

A further discussion of trends may 
here be aided by a subject-by-subject 
consideration of work in the various 
areas. 


General Psychology 


Yatabe wrote his Introduction to 
Psychology (213) from the standpoint 
of general behavioristics (1950), and 
a ITandbook of Psychology in 12 vol- 
umes is now being prepared under 
the sponsorship of the Japanese So- 
ciety of Applied Psychology. A latent 
behavioristic influence has now be- 
come fairly well established. In 
1941 Imada translated Bridgman’'s 
Logic of Modern Physics (33) into 
Japanese, and a criticism of opera- 
tionism was expressed in the same 


year by Yatabe (209). 


Experimental Psychology 


Handbooks of experimental psy- 
chology have been edited by Takagi 
and Kido, of which Volume I (gen- 


eral methods of experimentation), 
Volume II (vision), and Volume III 
(audition and other sensory experi- 
ence) have thus far been published 
(180). As to methodology, it is 
worth noting that interests in sto- 
chastics and factor analysis have in- 
creased among Japanese psycholo- 
gists since the war, and some re- 
searches are being conducted in these 
areas (Koga [57], Indo [34], Iwahara 
[39] and others). 

Perception. A book on the psy- 
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chology of perception was published 
by Ogawa, Tanaka, and Osaka (149) 
in 1952. Recently, Obonai (127) has 
surveyed western and Japanese stud- 
ies on visual perception. 

Experiments on perception are, in 
general, maintaining ties with the 
earlier gestalt influence, now some- 
what mixed with behavioristic tend- 
encies. Interest continues to be 
shown in problems of space percep- 
tion generally (for example, by 
Akishige [9], Miyakawa [100], Osaka 
[148]), but problems of size and shape 
discrimination (as related to the per- 
ceptual constancies) seem to be par- 
ticularly popular (Kume [68], Ma- 
kino [77], Misumi [95]). Quantitative 
theories of space, as, for example, 
Luneburg’s, are received with con- 
siderable appreciation. 

The topic of figural aftereffects is 
being studied with enthusiasm, and 
in the opinion of one of us (CHG) 
Japanese studies in this area are 
being done very effectively (e.g., by 
Ogasawara [130], Azuma [15], Oyama 
[150], and others [30, 123]). Kakizaki 
is studying the effect of preceding 
conditions on retinal rivalry. Ex- 
periments on time errors have been 
done by Inomata (35), Nakajima 
(115), Ono (145), and others. Obo- 
nai’s “induction theory” and Yo- 
kose’s psychophysical research (207, 
216) on form perception are influen- 
tial contributions. 

Interest in psychophysiological 
problems has increased since the end 
of the war; this is due, among other 
influences, to the very original work 
of the physiologist Motokawa (107, 
108, 109) on the electrical excitability 
of the eye. Studies on electrical ex- 
citability are now being carried on in 
some psychological laboratories. 
They involve studies of the time 
course of various color effects, form 
distortions, contrast effects, etc. 

Many areas of perception are being 
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examined, but it will not do at this 
time to describe the efforts in great 
detail. One trend may be worth men- 
tioning: the increased communica- 
tion of psychologists with physiol- 
ogists and people in related sciences. 

Audition has not become as ex- 
tensive a research interest as visual 
perception. Nevertheless, it is inter- 
esting to observe that Wada (201) 
has recently (1951) published a book 
with the same title as Yuki’s earlier 
volume, A Psychology of Tone. A 
recent publication concerns an analy- 
sis of Japanese vowels (110). 

Other sensory systems have not 
been studied extensively since the 
war. 

Animal and human learning ; think- 
ing. The greatly increased interest in 
animal experimentation is a major 
postwar development. Many experi- 


ments have been done recently at 


the universities of Tokyo, 


Kyoto, 
Tohoku, Hokkaido, and Osaka. The 
central problem has been the con- 
troversy between field theory and 
reinforcement theory. Studies have 
been made on the problems of sub- 
goal responses (Umeoka [194] and 
Yagi [204]), latent learning (Nozawa 
[124], Asami [38]), and the effects of 
amount of reinforcement and amount 
of drive. Studies of reasoning (Sue- 
naga [169]), place learning, and ex- 
perimental neurosis (Murata [114]) 
are in progress at this time. It is 
also worth noting that a Society for 
the Study of Behavior Theory has 
been established by psychologists at 
the universities of Kyoto, Osaka, and 
Osaka City. 

Considerable interest also exists in 
problems of human learning. Recent 
studies have been concerned with 
retroactive and proactive inhibition 
(Sagara, Ishiwara [37], Umemoto 
[193]), and researches are in progress 
on questions of temporal factors in 
interpolated materials and of gener 


451 


alization effects between original lists 
and interpolated lists. Human condi- 
tioning experiments (61, 62) have 
been almost wholly restricted to the 
laboratory of Kwansei Gakuin Uni- 
versity under Kotake. Studies of 
transposition responses in children 
have been performed, and the develop- 
mental aspects of this type of be- 
havior are now being examined by 
Sato, Okano (144), Motoyoshi (112), 
and others. 

Little work is being performed on 
motor skills. 

Recently a book on the psychology 
of learning was published by Yagi, 
Umeoka, and Maeda (205). 

In the period 1946 to 1949, Yatabe 
published A History of Thinking 
(211) and three volumes of his Psy- 
chology of Thinking (212). Volume I 
of the latter work is on concept and 
meaning, Volume II on relations and 
reasoning, and Volume III on the 
thinking of animals. These books by 
Yatabe, together with his History of 
the Psychology of Will (210), are 
authoritative and comprehensive. 
They are not as well known outside of 
Japan as they should be. 

Personality. Research on personal- 
itv per se has not expanded after the 
war to the same degree as some other 
areas of interest. In this connection, 
however, it is important to observe 
that many investigators are at work 
on experiments that are closely re- 
lated to this area, for example, those 
that deal with the influence of mo- 
tives on discrimination (92, 98, 162). 
In fact, most Japanese experimental 
psychologists show considerable in- 
terest in this topic and experiments 
are being done that bridge the bound- 
ary between studies of personality 
and perception (45). Some psycholo- 
gists (e.g., Kitamura and Imada) 
have recently considered the problem 
of the ego. 

Recently (1951) one of us (Sato) 
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published A Psychology of Personal- 
ity (161) which discusses the. train- 
ing procedures of Zen Buddhism 
against a background of western 
studies. 


Developmental Psychology, Education- 
al Psychology, and Measurement 


The reform of the Japanese educa- 
tional system, instituted by the Oc- 
cupation, involved an important pro- 
gram of teacher education and re- 
education. The re-education program 
has placed great stress on the data of 
educational psychology, and, under 
the new system, many school teach- 
ers have been instructed in this area. 
If widespread knowledge of an area 
is an important contributor to its 
vitality, then educational psychology 
(81, 212) is very much alive in Japan. 
American psychologists who went to 
Japan during the Occupation were 
usually educational psychologists, 
and it may be expected that the re- 
sults of their efforts will show tan- 
gible returns in the way of intensified 
activity in this area, 

Recently several books (22, 212) 
on the developmental aspects of edu- 
cational psychology have been trans- 
lated into Japanese; several books 
have been written on child psychol- 
ogy with an essentially educational 
emphasis, for example, one by Yama- 
shita (206); several books on the psy- 
chology of adolescence (Katsura [49], 
Ushijima [198]) have also appeared. 
Little attention has as yet been paid 
to problems of maturity and old age. 

Experimental studies of mental 
development are being carried on, 
notably by Sonohara (167) and 
Nakano (118). Takemasa published 
two volumes on Developmental Psy- 
chology (181) in 1948-1950. The 
large public interest in educational 
and developmental psychology is re- 
flected in the fact that journals in 
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this area are edited for the general 
public as well as the scientific. Em- 
phasis is placed on the problem of 
learning and its relation to vocational 
and educational guidance. 

Work continues in the general 
areas of intelligence and personality 
tests in several institutions, most 
notably in the National Research 
Institute of Education. Tanaka, for 
years one of the dominant figures in 
educational measurement, still re- 
mains a leader. 

An emphasis on mental hygiene 
characterizes postwar educational 
psychology, and some psychiatrists 
and psychologists are now working 
together. In Japan, the term ‘men- 
tal hygiene’ is used in its broadest 
sense to apply to normal as well as 
abnormal individuals. In student 
counseling work, many problems 
peculiar to Japan arise in a way that 
may have no parallel in the United 
States. In particular, political prob- 
lems are said to provide an “appar- 
ent’ focus of maladjustment in stu- 
dents. In any case the student coun- 
selor has an important role to play 
in modern Japan. 


Clinical Psychology 


The status of clinical psychology in 
Japan has changed greatly in the 


postwar period. The name “clinical 
psychology” is now in common use 
and the Society of Clinical Psychol- 
ogy has been formed. A journal of 
clinical psychology (Clinical Psy- 
chology and Educational Counseling) 
has recently been established, and 
now, for the first time, psychoan- 
alytic theories are becoming common 
subjects of discussion. Projective 
tests, such as the TAT and Ror- 
schach, play important roles in diag- 
nosis, and nondirective methods of 
therapy are being studied (55, 106, 
168). In all of this, the influence of 
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American clinical psychology is felt 
strongly, but in addition, some truly 
Japanese methods, such as Morita’s 
(104), which are based upon a syn- 
thesis of Zen doctrines and western 
studies, are regarded favorably. 
Without doubt, clinical psychology is 
developing rapidly as an important 
field of endeavor in Japan. 


Industrial Psychology 


Although industrial psychology in 
Japan has not developed to the same 
degree as in America, it has never- 
theless been shown that a basic core 
of subject matter, technique, and 
practice did exist prior to World War 
II. During World War II problems 
of training and efficiency became 
serious, and a number of psychol- 
ogists worked in the Army and Navy, 
especially in the selection of aviators. 
Since the war, labor problems (train- 
ing and worker morale, especially) 
have received more attention than 
previously. Several prefectures have 
established research institutions for 
work in this area, and a few universi- 
ties have set up chairs of industrial 
psychology and vocational guidance. 
Work in selection shows little prog- 
ress, for in a land with 85 million 
people and disproportionately  in- 
adequate natural resources, such 
procedures seem indeed to be fanci- 


ful. 
Social Psychology 


After the war many social prob- 
lems forced themselves upon the at- 
tention of psychologists. At the same 
time, the theories and experiments 
of social psychologists in America 
flowed into Japan and stimulated the 
Japanese workers. In consequence, 
work in social psychology is now be- 
ing carried on at an accelerated pace. 
At the annual meeting of the Jap- 
anese Psychological Association in 
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1952, about 40 studies were reported 
by social psychologists and, if re- 
lated researches are added to this 
total, the number becomes consider- 
ably greater. 

In the seven years since the war, 
the topics of central interest have 
changed. At first, considerable ef- 
fort was spent on the analysis of 
social and;cultural phenomena (3, 
32) that occurred during and after 
the war. Later, attention was paid 
to problems of group dynamics in con- 
nection with the democratization of 
the Japanese people. More recently, 
problems of social perception, com- 
munication (28), and the measure- 
ment of attitude became a central 
problem not only for social psychol- 
ogists but for many others as well. 
Most recently, psychologists have 
become interested in’ systematiza- 
(Ikeuchi [31], Suenaga [170], 
and others). Problems have been 
restated, and the possibility of treat- 
ing social behavior mathematically 
has been examined. Several re- 
groups have been formed, 
for example, the Society for the 
Research in Behavior Mechanisms 
(at Tokyo), the for 
Group Dynamics (at Kyushu), and 
the Society for Youth Work (at 
Tokyo). As yet, no special journals 


tion 


search 


Association 


have been established in social psy- 
chology, but a number of systematic 
treatises have appeared lately, no- 
tably those by Minami (94) and Shi- 
mizu (163). 


OVERVIEW 


Our survey points to the fact that 
psychology in Japan, like psychology 
in most countries, did not develop 
rapidly during the first 40 years of 
its existence. Circumstances im- 
proved after World War I, and after 
1926 psychology advanced in a surer 
fashion. By 1935 it gave promise, not 
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only of becoming stronger but also of 
becoming more diversified. Unfor- 
tunately, its favorable growth was 
severely inhibited by the events that 
led to the Sino-Japanese conflict and 
World War II. In effect, the war 
period became an interval of stagna- 
tion. Following the war, Japanese 
psychology underwent the readjust- 
ment common to all areas of Jap- 
anese life. At the present time, it 
shows signs of development to a new 
and stronger level. 


SOME OBSERVATIONS ON CURRENT 
ASPECTS OF JAPANESE PsYCHOLOGY 

It iscertain that the picture of pres- 
ent-day psychology in Japan differs 
in some details from the one that 
exists in America. The following dis- 
cussion is aimed at acquainting the 
American reader with some impor- 
tant facets of psychological endeavor 
in Japan. It is not intended that the 


discussion will deal with ‘“‘problems.” 
Rather, it is hoped that the descrip- 


tion of selected areas of difference 
between Japan and America will 
acquaint the western reader with 
some facts that are essential to an 
understanding of psychology in a dif- 
ferent culture. 


Training 


Under the Occupation (1945-1952) 
a strong effort was made to remodel 
the educational system of Japan.’ 
All aspects of education felt the im- 
pact of the attempt, and for a while 
the training of psychologists came 
under scrutiny. For purposes of this 
discussion we shall speak of the pre- 
Occupation system as the ‘‘old” sys- 
tem and the program suggested by 
Occupation authorities as the ‘“‘new”’ 
system. 


?We are indebted to Dr. D. D. Smith, 
Office of Naval Research, Washington, D. C., 
for information on certain educational policies 
of the Occupation. 
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The old system had a number of 
the characteristics of the British 
system. Education leading to grad- 
uate work was based upon the 6-5-3-3 
system: 6 years in primary school, 
5 years in middle school, 3 in pre- 
paratory school, and 3 in the uni- 
versity. On leaving junior college 
(cf. p. 444) the prospective university 
student had to take entrance exam- 
inations which covered many fields; 
each university provided its own 
examination. If a student passed the 
written examination, he was called in 
for an oral examination and an inter- 
view. It is important to notice that 
university study began in the 14th 
school year as in England, not in the 
12th school year as in America. 

During his university days, a stu- 
dent might concentrate in psychology. 
The courses required in the psychol- 
ogy program varied somewhat from 
university to university, but, in gen- 
eral, they seem to have covered the 
conventional range of subject matter 
despite such differences as may have 
existed among, for example, Kyoto, 
Tokyo, and Keio. Kyoto had (and 
has) a requirement in philosophy 
that does not exist at Tokyo, and 
Keio emphasizes work in natural sci- 
ences. 

Occupation authorities made an 
attempt, under the new system, to 
change the method of selecting uni- 
versity students for national univer- 
sities. (This change constituted a 
small segment of a program to recon- 
stitute the educational system into a 
6-3-3-4 sequence, analogous to the 
American scheme: 6 years of primary 
school, 3 of lower secondary, 3 of up- 
per school, and 4 of university.) It 
was planned under the new system 
that university applicants should 
take the same examinations at ap- 
proximately the same time through- 
out the nation. After a student had 
‘“‘passed”’ the national aptitude test, 
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it was planned that he be invited to 
the university of his choice for spe- 
cial examinations prepared by that 
university. If the student passed the 
examinations, he was to be inter- 
viewed. If he did not pass, he was to 
be permitted to have his files sent 
to one more national university. Pri- 
vate universities planned to operate 
differently, but the projected pattern 
was similar in many respects to the 
one here outlined. It turns out now 
that ways of utilizing the national 
aptitude test differ among different 
universities. Some universities use 
these tests as screening devices, and 
some use them only to provide addi- 
tional advisory material. 


Under either the old or new system, 
a student who graduated from a uni- 
versity could enter upon graduate 
work. Under the old system, no spe- 
cial courses were required during the 
period of graduate training. Once the 


undergraduate courses were com- 
pleted the student could apply for 
admission to the dean of the depart- 
ment and to the Faculty of Litera- 
ture (under which the psychology de- 
partment usually exists). Usually, 
as at Tokyo, entrance examinations 
were required only for people who 
came from other universities. Under 
these circumstances it turned out 
that the graduate students in a 
given university had usually been 
undergraduates in the same uni- 
versity. Very often the student 
stayed on as an assistant to the pro- 
fessor for many years.® 

The new program did not say much 
about selection procedures, but it 
did make the proposal that 30 credits 
of course work be required in the first 
year of graduate study. These cred- 


Most large Japanese universities are, in 
fact, not freely open to graduates of other uni- 
versities. Most scholars remain in the same 
school from their undergraduate days until 
they achieve professorial rank. 
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its were to be accumulated in the 
areas of experimental, clinical, social, 
developmental, educational, and cer- 
tain optional areas. So far, the new 
program has not received as much 
support as was expected. In 1953 
about one-third of entering graduate 
students embarked on the new course. 

The main argument against the 
new program is that it is expensive 
both for the university and for the 
student. Under the old plan, gradu- 
ate students often received salaries 
as assistants, and the number of as- 
sistants required in a university par- 
tially determined the number of 
graduate students. Under the new 
plan, a student who has to devote his 
first year to study is not likely to re- 
ceive an assistant’s fee.‘ 

Whatever may be its fate at other 
strata of the educational structure, 
the new system has a “hard row to 
hoe” at the level of graduate work. 
For this reason, it may here be more 
realistic to restrict the present dis- 
cussion of graduate training to the 
old system, unless otherwise specified. 

In general, few examinations are 
given during a man’s graduate work, 
and the requirements for reports on 
research vary from university to 
university. At Tokyo and Kyoto a 
research report must be made once a 
year, but at some other universities 
this requirement does not exist, the 
final thesis being the criterion for 
completion of the work. As had been 
said, the major professor decides 

‘Where the new program does exist it has 
often been confusingly intermixed with the 
old program. The result is that few students 
take the graduate courses required by the 
new program to arrive at a possibly somewhat 
higher level than that specified by the M.A. 
degree in America. In the universities of 
Tokyo and Kyoto, for example, the number 
of students entering the graduate courses is 
only four a year. This number, of course, is 
small in comparison with the number of stu- 
dents who receive the master’s degree every 
year in America. 
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when a man will finally stop his 
training. A doctorate degree is 
granted, if it is granted at all, usu- 
ally after many years of professional 
work. It is based upon a defense of 
the thesis. 

What opinions do Japanese psy- 
chologists have about ways of im- 
proving graduate training? The fol- 
lowing proposals were frequently en- 
countered and seem to represent the 
prevailing views of the Seminar partic- 
ipants: (a) establish a program lead- 
ing to a terminal doctor’s degree ob- 
tainable after a reasonable period of 
graduate study;5 (b) differentiate de- 
partmental offerings into social, ap- 
plied, and experimental psychology; 
(c) appoint more professors in more 
diverse areas of psychology ;® (d) im- 
prove the general level of undergrad- 
uate instruction; (e) specify formal 
prerequisites for graduate training in 
each area of psychology; (f) provide 
more adequately supervised work in 
the laboratory and in the clinic; (g) 
improve facilities for research and 
graduate training; (hk) place psychol- 
ogy in some other faculty than the 
Faculty of Literature. 

Some of the proposals may be car- 
ried out. It is possible, for example, 
that psychology may vacate its 


'In Japan the M.D. degree is given after 
several years of research. ‘‘Bungakuhakase”’ 
(literally, Doctor of Literature, correspond- 
ing to the western Ph.D.) is not usually given 
until a psychologist approaches his fiftieth 
birthday. 

* Two professorships in psychology exist in 
each of the two most recently founded univer- 
sities, Hokkaido and Osaka. Tokyo Uni- 
versity has only recently obtained a second 
professorship. The universities of Kyoto, 
Tohoku, and Kyushu have only one chair. 
This situation is to be contrasted with the 
one that exists in philosophy. At Kyoto, for 
instance, there are seven chairs of philosophy. 
This ratio of philosophy professors to psychol- 
ogy professors has remained unchanged since 


1906. 
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place in the Faculty of Literature. 
Other members of the Faculty even 
now accept the fact that psychology 
is misplaced. Certain government 
officials seem to favor the reclassifica- 
tion of psychology, but others do 
not, probably on the grounds that, as 
a recognized laboratory and clinical 
discipline, it will become more ex- 
pensive. 

While external agents are helping 
determine the position of psychology, 
psychologists themselves can do a 
great deal in the way of advancing 
training procedures. They can, for 
example, agree on well-chosen policies 
concerning the effective use of pres- 
ent facilities. (Facilities are, in fact, 
adequate for many purposes, and 
much excellent work can be produced 
with them. Ina few universities [e.g., 
Kwansei Gakuin] the equipment used 
in prevailing research is good.)? 
Above all, establishing policies aimed 
at producing broadly trained psy- 
chologists, who are skillful in the use 
of research techniques, will constitute 
an important step.® 


Professional Opportunities 


The lot of a psychologist in Japan 
has many desirable features, but, 
financially, it is no more (but prob- 
ably no less) attractive (in view of 
prevailing economic conditions) than 
that of many academic people in 
other lands. A full professor may 
earn $70 to $100 per month; an assist- 
ant professor, $50 to $90; an instruc- 


7 Laboratory equipment was, in general’ 
not obtainable during the war. In conse- 
quence much of the equipment, even in the 
relatively new laboratories of Tohoku and 
Kyushu universities, is old. The provision of 
funds for laboratory equipment is not yet 
adequate. 

® See, for general background information, 
an article by M. Imada: Recent psychological 
thinking in Japan. Ann. Proc. Dept. Psychol. 
Kwanset Gakuin Univer., 1954, 1, 1-9. 
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tor, $20 to $85; and an assistant about 
$20. (The salaries quoted include 
appropriate allowance for housing, 
etc.) The few jobs available in edu- 
cation, government, and industry 
probably provide slightly higher max- 
imum salaries than the academic 
positions. 

A main flaw in the professional 
picture lies in the fact that profes- 
sional psychology is relatively un- 
developed in Japan. Psychology has 
not been exploited by government, 
education, and industry, as has been 
the case in the United States, and 
few exist for psycholo- 
gists outside of universities. Until, 
through the years, a demand is felt 
for psychological work in other places 
than psychology departments, it is 
unlikely that professional opportuni- 
ties will expand greatly. It now re- 
mains for psychologists to exploit 
circumstances in which the fruitful- 
ness of psychological methods may 
be shown to advantage. 


positions 


Problems of Communication 


Psychologists in Japan feel iso- 
lated, and, in fact, they have been 
isolated, not only from western psy- 
chologists but, to a greater degree 
than they desire, from each other. 


The isolation from western psy- 
chologists is attributable to a great 
many factors, among which 
graphic position plays a minor role. 
The lack of contact with the western 
world that existed from 1937 to 1945 
had its undesirable effects, and the 
Occupation years (1945-1952) that 
followed constituted a period of read- 
justment that did not provide the 
best circumstances for scientific give- 
and-take. (During the Occupation a 
number of American educational and 
military psychologists worked with 
Occupation officials, and, in fact, did 
establish good contacts with Japa- 


geo- 
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nese workers. However, these pleasant 
relations remained indirect in most 
cases, because the Americans usually 
worked in administrative positions. 
It is to be hoped that more oppor- 
tunities for close personal relations 
in the seminar and laboratory, such 
as existed during the Kyoto Seminar 
in Experimental Psychology, will be 
established soon. It is encouraging to 
observe that the appointment of 
George M. Haslerud as Fulbright 
Professor at Kyoto and the visit of 
Koiti Motokawa to the United States 
during 1953-1954 signify steps in this 
direction. ) 

The barrier of language has, as 
much as any other factor, served to 
confirm the isolation of the Japanese. 
Despite the fact that Japanese train- 
ing in foreign languages is good, the 
training of western workers in Jap- 
It has been the lot of 
Japanese psychologists since their 
beginnings to do research which re- 
little known to the outside 
world. In consequence, Japanese psy- 
chologists have had to depend on the 
approval of their fellow countrymen. 
This situation may certainly have 
been attended by undesirable effects, 
which, fortunately, now seem to be 
disappearing. 

Certain devices for increasing com- 
munication might be tentatively 
recommended without commitment 
as to their ultimate value. For ex- 
ample, Japanese psychologists could 
be encouraged to publish in other 
languages than their own. (It ts 
worth observing in this connection 
that, even now, Japanese psychol- 


anese is bad. 


mains 


ogists, feeling a need to converse 
more effectively with their western 
usually add extensive 
I:nglish abstracts to their research 
reports.) Other devices such as inter- 
cultural seminars, exchange scholar- 
ships, exchange professorships, inter- 


colleagues, 
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cultural institutions, etc. might pro- 
duce more intimate scientific rela- 
tions. Certainly such programs 
should be encouraged. 

During the war and for some time 
thereafter, Japanese psychologists 
had little contact with each other. 
The Japanese Psychological Associa- 
tion may well ask itself what meth- 
ods may be used to increase com- 
munication among its members. The 
planning of symposia on topics of 
general interest might be suggested 
as an area in which the Association 
can make effective contributions, and 
there are indications that action will 
be taken along this line. Finally, it 


is worth observing that devices that 
improve intercultural relations can 
also improve contacts among the 
Japanese. One of the chief merits of 
the Kyoto Seminar lay in the fact 
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that it attracted psychologists from 
widely separated areas of Japan and 
provided an appropriate focus for dis- 
cussion and mutual acquaintance. 


SUMMARY 


An account of Japanese psychology 
is presented, first from a_ historical 
point of view, and second from the 
point of view of analyzing some of its 
modern aspects. 

The historical sequence is broken 
up into three parts: from the begin- 
nings of psychology (about 1880) un- 
til 1926, from 1926 through World 
War II, and from the end of World 
War II to the present. Some current 
aspects manifested by Japanese psy- 
chology are considered. The discus- 
sion of these matters centers about 
the topics of training, professional 
opportunities, and communication. 
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The purpose of this review is to 
describe the leaderless group discus- 
sion (LGD): its history, applicability, 
method of administration, and relia- 
bility and validity as a technique for 
assessing leadership potential. We 
shall discuss the major variants in 
procedure that have been tried, and 
indicate the effects of these variations 
on LGD reliability and validity. We 
shall not consider the initially leader- 
less discussion’s role as a basic re- 
search tool for studying the develop- 
ment of leadership as it has been 
employed by investigators such as 
Carter (29, 30, 31, 32), Pepinsky, 
Siegel, and Vanatta (61), and Bell 
and French (24), although we will 
make use of their findings wherever 
they shed light on the LGD as an 
assessment device. 


HISTORY OF THE LEADERLESS 
Group DISCUSSION 


According to Ansbacher (2), the 
originator of the method was J. B. 
Rieffert, who directed German mili- 
tary psychology from 1920 to 1931. 
The technique, called by first users 
the Schlusskolloqguium or Rundge- 
sprich, was aimed at showing be- 
havior “toward equal partners.’’ The 
German Army used the procedure 
until about 1939, while the Navy 
continued employing it in their selec- 
tion programs until late in World 


1 Several parts of this article were sum- 
marized for presentation at a Symposium on 
Situational Performance Tests, Sixty-first 
Annual Meeting of the American Psychologi- 
cal Association, Cleveland, Ohio, September 
7, 1933. 


War II. Various German civilian 
agencies now appear to be employing 
the LGD as an assessment tool. 

Influenced by the German de- 
velopments in situational testing, by 
1942 the British War Selection Board 
introduced such tests into their bat- 
tery for selecting Army officer candi- 
dates. The basic series of leaderless 
group tests was evolved by Bion and 
included the LGD (41, 45, 70). <A 
similar program was established by 
the British Navy (70). 

At the end of the war, the LGD 
was employed by Fraser (39, 40) as a 
device for screening British manage- 
ment trainees, and by Vernon (68, 
69) for testing British Civil Service 
applicants. 

Similar developments took place in 
Australia (43, 65), South Africa (3), 
and Norway.? 

The OSS Assessment Staff (59) ap- 
pears to have initiated use of the 
LGD in the United States late in 
World War II. American federal and 
state civil service examiners began 
trying out the technique at the end of 
the war (5, 26, 27, 53). 

Approximately 25 per cent of the 


190 Civil Service agencies surveyed 
by Fields (37) reported using the 


LGD. A Federal Civil Service man- 
ual appeared in 1952 (56). The LGD 
has also been employed by several 
American industrial and_ business 
firms. Its rapid acceptance has led 
both Meyer (58) and Douglas (35) to 
caution about overestimating its 
validity and utility. 

? Private correspondence from V. C. Jahl, 
Chief Psychologist, Norwegian Armed Forces. 
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We attribute the use of the LGD 
primarily to its relative ease of ad- 
ministration compared to individual 
interviews, where large 
numbers of applicants are involved, 
as well as to its face validity com- 
pared to paper-and-pencil tests. 

It may be that the face validity of 
the LGD really is what Gulliksen 
(44) has labeled intrinsic validity. A 
number of extrinsic validity studies 
are now available which may justify 
ex post facto the possibly widespread 
premature acceptance of the LGD as 
a valid measure of the tendency to 
display successful leadership. 

Several investigators have assumed 
that the LGD is intrinsically valid 
as a means of appraising successful 
leadership behavior. Pepinsky, 
Siegel, and Vanatta (61) have used 
LGD performance as a criterion for 
evaluating the effects of counseling; 
Bell and French (24) and Carter and 
his associates (29, 30, 31, 32) have 
assumed that they are studying lead- 
er behavior when observing discus- 
sions. 

Another reason for the continuing 
interest in employing situational tests 
to forecast or appraise leadership 
behavior may be due to the nature of 
leadership and psychometrics. 

In reaction to the earlier emphasis 
on the effects of individual differences 
on leadership behavior, there has 
been, more recently, an emphasis on 
situational effects on leader behavior. 
That both 
important 


especially 


sources of variance are 


for leadership behavior 
can be accounted for fully only after 
analyzing the main effects and the 
interaction effects of situational dif- 


ferences and individual differences 
in motivation, behavioral history, 
and biological level (maturity, hered- 
ity, and integrity of the CNS).3 


§ This formulation was originally proposed 


by W. P. Hurder. 
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It follows that any test designed to 
forecast leadership potential and in- 
tended to have some generality of 
application would need to: (a) share 
elements in common with a variety 
of situations; (6) vary directly with 
the situation for which predictions are 
to be made. 

Most psychometric test procedures 
have been developed to meet only the 
first requirement, for many behaviors 
are relatively little influenced by 
situational changes. Thus, one’s 
speed-of-arm movement or spatial 
visualization accuracy remains fairly 
constant over a wide range of situa- 
tions. On the other hand, since the 
effects on leadership behavior of situ- 
ational change are large, psycho- 
metric methods like the LGD, and 
other situational tests which meet 
both requirements, would appear to 
offer more promise initially as psy- 
chometric techniques for forecasting 
leadership behavior. 

With reference to the first require- 
ment, the LGD, in common with a 
range of other situations in which 
leader behavior is to be appraised or 
predicted, appears to share such ele- 
ments as the need for would-be 
leaders: to communicate effectively, 
to overcome inertia, to solve various 
interaction problems, to meet dead- 
lines, and to reach consensus. 

With reference to the second re- 
quirement, the LGD and other situ- 
ational tests, by their very construc- 
tion and administration, tend to vary 
consistently with the nature of the 
examinees and the real-life situations 
for which personnel are being chosen. 
Candidates for positions of leader- 
ship are assessed among administra- 
tors by observing them solving ab- 
stracts of administrative problems 
among administrative trainees. Tests 
of the same individuals for positions 


as Army officers would involve 
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studying them solving military prob- 
lems during interaction with officer 
candidates. To some extent, there- 
fore, situational tests may be able 
to take 
account. 


situational variations into 


APPLICABILITY OF THE LEADER- 
LESS GRouP DISCUSSION 


According to published reports, 
the LGD has been used to assess can- 
didates for many professions and 
occupations. The examinees have 
included: officer candidates (2, 41, 
43, 45, 70, 71); OSS agent applicants 
(59); advanced Naval (31, 32), Air 
Force, and Army ROTC cadets (15); 
industrial management trainees (39, 
40, 65); industrial executives and su- 
pervisors (21, 22); shipyard foremen 
(53); supervisory labor mediators 
(42); civil service supervisors and 
administrators (3, 54, 68, 69); ap- 
plicants for foreign service (68, 69); 
graduate engineer trainee applicants 
(9, 10); sales trainee applicants (9, 
10); public health physicians (5, 26, 
27); teachers (47); visiting teachers 
(51); and social workers 
(52). 


service 


DESCRIPTION OF THE LEADERLESS 
Group DIscussION 

The basic scheme of the LGD is 
to ask a group of examinees, as a 
group, to carry on a discussion for a 
given period of time. No one is ap- 
pointed leader. The examiners do not 
participate in the discussion, but 
remain free to observe and rate the 
performance of each examinee. To 
date, there has been little standard- 
ization by all examiners of the num- 
ber of per group, the 
length of testing time, type of prob- 
lems, if any, presented to the candi- 
dates, and the directions given to 
them. Also, the number of raters has 
varied, as has the seating arrange- 


discussants 


SS GROUP DISCUSSION 
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ment. Examiners have differed in 
the kinds of behavior they have ob- 
served and rated and in the extent 
to which their ratings have been at- 
tempts to describe the behavior they 
have observed, rather then attempts 
to make inferences about the per- 
sonality of the candidates. 


Factors Rated by Observers 


Unless otherwise indicated, the 
LGD results we shall describe are 
either based on observers’ ratings of 
the amount of successful leadership 
displayed‘ in the discussion, or are 
inferences about the personalities of 
the candidates based on observations 
of this behavior. 

In regard to personality inferences, 
Couch and Carter (34), following a 
factorial analysis of observers’ infer- 
ences based on several kinds of situ- 
ational tests including the LGD, 
found that three independent factors 
could be isolated which accounted 
for situational test ratings. The fac- 
tors and the ratings with highest 
loadings on each factor were: (a) 
Individual Prominence (authoritari- 
anism, confidence, aggressiveness, 
leadership, striving for recognition); 
(b) Group Goal Facilitation (efficien- 
cy, cooperation, adaptability, pointed 
toward group solution); (c) Group 


‘A leadership act is said to occur when 
member 
rec ted 


\ of a group behaves in a way di- 
toward another member, 
3's, behavior. More specifically, a leadership 
act occurs when A’s behavior is directed to- 
ward: (a) changing the intensity and/or di- 
motivation, and/or (b) re- 
with the 


changing 


rection of B's 


tructuring B's 


abilities to cope 
reduce B's needs (13). 

All attempted \eadership acts in which A 
reaches his goal of changing B are considered 
uccessful leadership. If B's change in be- 
havior brought about by A’s successful leader- 
ship leads to need satisfaction for B and for A 
(apart from A’s satisfaction in being a success 
ful leader), A’s successful leadership act is 
considered effective (46). 


ituation an 
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Sociability (sociability, adaptability, 
pointed toward group acceptance). 

As will be pointed out later, many 
of these specific inferences by obser- 
vers, such as those concerning au- 
thoritarian tendencies, may be valid 
only as descriptions of performance in 
leaderless groups and actually may 
be inversely related to attitudes and 
performance in real life.® 

A similar factorial analysis of OSS 
situational test data by Sakoda (62) 
uncovered three factors which re- 
semble Couch and Carter's: (a) 
Physical Energy (energy and initia- 
tive, physical ability, leadership) ; (d) 
Intelligence (effective intelligence, 
observing and reporting, propaganda 
skills); (c) Social Adjustment (social 
relations, emotional stability, ‘‘secu- 
rity’). Carter (29) noted that similar 
factors appeared in several studies 
when real-life leader behavior was 
rated and factor analyzed. 

In aiming to estimate successful 


leadership behavior, the descriptive 
check lists of behavior in leaderless 
discussions that evolved from vari- 
ous studies made by the author and 
associates appear to concern primar- 
ily what Hemphill (46) has labeled 


“Initiation of Structure,’ a factor 
which has emerged in several of the 
Ohio State Leadership Studies (e.¢., 
38) and which has similarities to 
Couch and Carter’s ‘Group Goal Fa- 
cilitation.” 

In the check list used in most stud- 
ies by the author and his co-workers 
to assess leader behavior, raters are 
asked to indicate whether each candi- 
date showed the following behaviors 
“‘a great deal”’ (4 pts.), “fairly much” 
(3 pts.), ‘‘to some degree” (2 pts.), 
“comparatively little’ (1 pt.), or 
“not at all’’ (0 pts.): (a) Showed initi- 

§ For a discussion of leadership and au- 


thoritarianism, the reader is referred to Hol- 
lander (48). 


BERNARD M. BASS 


ative; (b) was effective in saying what 
he wanted to say; (c) clearly defined or 
outlined the problems; (d) motivated 
others to participate; (e) influenced the 
other participants; (f) offered good 
solutions to the problem; (g) led the 
discussion. The sumof ratings received 
on all seven items is the examinee’s 
performance score. 

In an unpublished study by Pruitt 
and the author, seven items were 
rated which aimed at assessing the 
extent to which each LGD partici- 
pant showed “consideration for the 
welfare of his associates’’—a factor 
of leader behavior uncovered in vari- 
ous Ohio State Leadership Studies 
(38). This factor is similar to Couch 
and Carter’s Group Sociability and 
Sakoda’s Social Adjustment factors. 
The seven items included: (a) En- 
gaged in friendly jokes and com- 
ments; (b) made others feel at ease; 
(c) complimented others; (d) helped 
others; (e) encouraged others to ex- 
press their ideas and opinions; (f) had 
others share in making decisions with 
him; (g) helped settle conflicts. At the 
same time, a list of seven “Initiation” 
behaviors were observed and rated. 

The intercorrelation between 80 
LGD participants’ Initiation and 
Consideration was .78, too 
high to warrant continued use of 
both assessments. More significantly, 
the mean rating on any single Initia- 
tion behavior was 2.2 points, while 
the mean rating on any single Con- 
sideration item was 0.75 points. 
Thus, although raters could assess 
reliably both types of leader behavior 
(r,,=.90, .85), much less Considera- 
tion behavior appeared, and most of 
it was exhibited by Initiators. 

On the basis of this, and evidence 
to be presented later, we suspect that 
the LGD rating is more an assess- 
ment of the tendency to initiate struc- 
ture in an initially unstructured 


scores 
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social situation—one of several types 
of successful leadership behavior— 
than an assessment of tendencies to 
be considerate. 


Conditions Affecting LGD Leadership 
Ratings and Performance 


A number of variations have been 
systematically studied which may or 
may not seriously affect ratings of 
successful leadership in the LGD. 
These include: the size of the group, 
the seating arrangement, pretest 
coaching, the motivation of the par- 
ticipants specific to the situation, and 
member participation. 

Effects of size. As the size of the 
group increases from two to twelve, 
the mean LGD rating assigned is 
reduced approximately 50 per cent. 
Eighty-three per cent of the variance 
in ratings where members come from 
groups of 2, 4, 6, 8, and 12 is ac- 
counted for by size according to a 
study of 120 examinees by the au- 
thor and Norton (19). It appears 
that the opportunity to display suc- 
cessful leadership is closely associated 
with the size of the group. We con- 
clude that proper correction must be 
made in any LGD studies where dif- 
ferent examinees have been tested in 
groups varying in size. 

Effects of location of seat and seating 
arrangement. Sixty-eight discussions 
among 467 participants were ana- 
lyzed by the author and Klubeck 
(16) to determine the effects of the 
particular seat a participant held on 
the LGD rating he obtained. For 
both V-shaped arrangements and 
those in which members sat in paral- 
lel rows facing each other, members 
seated at the ends obtain slightly 
higher mean scores. In two sets of 
the seven studies included in this 
analysis, the results were significant 
at the 1 per cent level. The effects 
tended to disappear when variations 
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in the real-life esteem of the members 
were held constant. At any rate, the 
differences, statistically significant or 
not, associated with participants’ 
location in the group were too small 
to be of much practical concern. 

Pretest coaching. If LGD perfor- 
mance can be greatly altered by 
means of brief coaching which does 
not reflect any real change in per- 
sonality, its routine use by large 
organizations such as the Armed 
Forces for screening OCS applicants 
would be impossible. Therefore, 
Klubeck and the author (50) briefly 
coached the third highest and the 
sixth highest participants (among 
seven) of each of 20 leaderless dis- 
cussions and then ran retests on each 
group of seven. An analysis of co- 
variance led to the inference that 
while those who were fairly high in 
LGD score initially profited signif- 
icantly from coaching, those who 
were initially low did not profit at all 
from the brief coaching. While the 
shift upward of the high ranking 
subjects was statistically significant, 
it was not very large in an absolute 
sense. 

The investigators cited a number 
of reasons for rejecting the inference 
that the differential improvement was 
due to differential motivation, and 
concluded that LGD behavior is a 
function of personality traits and 
needs which cannot be altered readily 
by brief coaching. 


This interpretation was in agree- 
ment with Harris’ (45) opinion that 


“cram” for the leader- 
less group discussion. Harris sug- 
that priming a candidate, 
rather than help would most likely 
handicap him by “inhibiting his 
spontaneity.” 

In an unpublished study, Pruitt 
and the author gave the same part- 
directive, part-permissive coaching 


one could not 


gested 
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as Klubeck and the author had ad- 
ministered previously, but the groups 
receiving training were coached for 
15 minutes while assembled as groups 
prior to undergoing the LGD. Di- 
rectly in line with Harris’ hypothesis, 
the five trained groups, each with six 
participants, showed © significantly 
less ‘‘initiation’’ behavior than eight 
untrained control groups of six each. 
The trained groups exhibited only 
half as much ‘consideration’ be- 
havior as the untrained groups. Rat- 
ers commented on the ‘freezing up” 
and the “increased nervousness and 
tensions’’ which characterized the 
trained groups; this was in line with 
Harris’ hypothesis. 

As pointed out by Klubeck and the 

author: 
... long term training is obviously an entirely 
different matter. Thus, if an ineffective indi- 
vidual underwent psychotherapy successfully 
which led to favorable modifications in his 
needs and self-esteem, there is no reason to re- 
ject the possibility that he would exhibit im- 
proved performance on the LGD, but here the 
LGD would reflect real personality change 
(50, p. 71). 

Actually, Pepinsky, Siegel, and 
Vanatta (61) carried out such long- 
term training with some success. 

Iiffects of extrinsic motivation of 
participants specific to the situation. 
Do momentary the in- 
centive to participate, unrelated to 
the personality of the participants, 
make much difference in LGD per- 
formance? An unpublished study 
completed by the author tentatively 
suggests that added extrinsic motiva- 
tion is relatively unimportant in 
determining the behavior of LGD 
participants. Two small samples of a 
total of 31 
tested and retested in leaderless 
groups of seven to nine. The first 
sample was told that the first test 
was mere practice and had no bearing 
on class grades, but that the second 


changes in 


college students were 
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test would count as an important 
grade determiner. The second sam- 
ple took the “important’’ examina- 
tion first and was then given the sec- 
ond test as a ‘“‘check-up on the test” 
that would not affect them. The 
means of successful leadership dis- 
played in both the motivated and un- 
motivated situations were practically 
identical. Moreover, the correlation 
between LGD performance in both 
situations was .86, indicating that 
the addition of incentives affected 
performance relatively and absolute- 
ly very little. 

Needs more basic to the personal- 
ity probably energize and sustain 
LGD behavior. These needs appear 
to be either openly denied or uncon- 
scious to a great extent. When 227 
ROTC cadets, in an unpublished 
study by the author and 
were asked to indicate on a five-point 
scale how much they tried to do as 
well as possible on an LGD, a correl- 
ation of only .30 was found between 
reported effort and actual LGD 
scores obtained. 

Analogous to performance’ on 
paper-and-pencil aptitude tests, in- 
creasing the extrinsic motivation of 
mature subjects does not serve to 
raise scores materially, since sub- 
usually perform near their 
maximum without such added in- 
centive. Of course, we might suc- 
ceed in lowering performance if we 
could sufficiently discourage subjects 


Coates, 


jects 


or increase tension beyond some opti- 
mum point. 

It is possible that the LGD may 
be no more sensitive to variations in 
examince’s extrinsic motivation than 


most aptitude tests. It is probable 
that the LGD is less affected by such 
extrinsic motivation as the 
to obtain a job than is the usual un- 
disguised personality test. 

What is needed are comparative 


desire 
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studies of operational, as opposed to 
experimental, validities of the LGD. 

Amount of participation. Analyses 
generally find a high correlation be- 
tween the sheer amount of talk of 
LGD participants and the scores they 
earn for successful leadership. Time 
spent talking correlated .65 with rat- 
ings of success for 64 sales and man- 
agement trainee candidates (10), .77 
for 140 sorority girls (23), .93 for 
20 college students (6), and .96 for 
36 college students of an unpublished 
study by R. L. French.*® 

This high correlation is disturbing 
at first, since it suggests that LGD 
ratings primarily discriminate the 
verbose from the terse. However, 
the relationship can be shown by a 
series of deductions to logically fol- 
low if we assume that almost all 
participation in the LGD is attempted 
leadership behavior, that LGD rat- 
ings are 


assessments of successful 


leadership behavior, and that at- 
tempted leadership must i 
order for some of it to be judged suc- 


occur 1n 
cessful. More detailed discussions 
of these relationships are presented 
elsewhere (6, 13). 

Kind of participation. Qualitative 
differences appear in the kinds of 
LGD participation engaged in by the 
successful leader and by those who 
participate and attempt leadership 
acts, but who nevertheless earn low 
scores as successful leaders. When 
the responses during leaderless dis- 
cussions of 46 fraternity members 
were analyzed by the Bales technique 
(4), other judges’ ratings of partici- 
pants’ successful leadership  corre- 
lated .66 with frequency of attempting 
answers; .50 with frequency of post- 
tive socioemotional responses; .44 


* Frencu, R.L. Verbal output and leader- 
ship status in initially leaderless discussion 
groups. Amer. Psychologist, 1950, 5, 310. (Ab- 
stract) 
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with frequency of asking questions; 
and .32 with frequency of megative 
socioemotional responses (12). The 
roles associated with successful lead- 
were initiator-contributor, 
elaborator, compro- 
miser, orienter, evaluator, energizer, 
encourager, ete. (11). Only one type 
of participation, attempling answers, 
was associated (r =.48) with the real- 
life esteem of the participants. 

In a study of 40 ROTC cadets, 
Carter, Haythorn, Shriver, and Lan- 
zetta (32) found that LGD leaders 
were much more likely than non- 
leaders to diagnose the situation, ask 
for expressions of opinion, propose 


ership 
opinion-giver, 


courses of action for others, support 
and defend their proposals, give in- 
formation, express opinions, and ar- 
gue with others. 


RELIABILITY OF THE LEADERLESS 
(;ROUP DISCUSSION 

The reliability of the LGD will be 
considered in two phases: rater agree- 
ment and_ test-retest reliability. 
Wherever possible we will indicate 
factors that systematically influence 
these reliability estimates. 


Rater Agreement 


Table 1 displays the average agree- 
ment found between any two ob- 
servers in rating the first LGD ad- 
ministered to the designated sub- 
The results suggest that the 
7-item check list and _ its 
predecessors of 9 and 14 items used 
in most of the studies by the author 


jects. 
refined 


and associates (e.g., 15) vield a cone 
sistent rater correlation of between 
82 and .84. 
two raters using the check list method 
vield a satisfactory estimated relia- 
bility of .90 or above. 

A number of factors influence the 
agreement between raters. These 
will be considered next. 


LGD scores based on 
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TABLE 1 
RELIABILITY OF LGD RatiINGs ESTIMATED BY THE AVERAGE AGREEMENT 
BETWEEN ANY Two RATERS 


| 


Subjects 


What Rated 


eat | Estimated 
|_/*Verage | Reliability 
Correlation} & LGD 
between Rating 
Any Two | Using Two 
Observers | Observers 


Rating 


} 
Method | 
| 
| 


12. 48 


sales and mgt. trainee 
candidates (9) 
administrative 


candidates (3) 


trainee 


male 
(30) 
NROTC subjects (31) 


college 


mixed college students | 


(6) 


fraternity members (20) 

mixed college students 
(19) 

ROTC cadets (15) 

fraternity pledges (72) 


sorority members (23) 


mixed college students” 


mixed college students* 


students | 


Desirability for 
position 

Personality and 
ability (high- 
ly intercorre- 
lated) 

Leadership per- 
formance 


| Leadership, au- | 


thoritarian- 
ism, initia- 
tive, and 
sight 
intercorre- 
lated) 


Successful lead- | 


ership dis- 
played 


Successful lead- 


ership dis- 
played 
Successful lead- 
ership dis- 
played 
Successful lead- 
ership dis- 
played 


| Successful lead- 


ership dis- 
played 


Successful lead- | 


ership dis- 
played 
Initiation of 
structure and 
interaction 
Consideration of 
others 


Graphic 


in- | 
(highly 


Paired com- .67 
parison 
Graphic rat- .70 


ing scale 


rat- 
ing scale 

Graphic 
ing scale 


rat- 


“heck list 
(13-item) 


‘heck list 
(14-item) 


‘heck list 


(9-item) 


“he ck list 
(9-item) 


‘heck list 


(9-item) 


‘heck list 
(7-item) 


‘heck list 
(8-item) 


‘heck list 
(8-item) 


* Unpublished data. 


Effects of discussion effectiveness. A 
pronounced curvilinear relationship 
was found between the rated effec- 
tiveness of 67 leaderless discussions 
and the correlation between two ob- 
servers’ ratings for each discussion. 


Rater agreement was highest for 
discussions of average effectiveness 
and was lowest for discussions which 
were either extremely effective or 
extremely ineffective. The average 
eta found for four subsamples of these 
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67 discussions was .68 (17). It may 
be that observers become too inter- 
ested in the very effective discussions 
and too bored or detached from the 
very ineffective ones; or, the variance 
of LGD ratings may be reduced in 
discussions at either extreme of the 
distribution of effectiveness, which, 
in turn, will serve to reduce the relia- 
bility of the ratings. 

I:ffects of size. The number of 
participants in a given discussion ap- 
pears to influence the extent to which 
observers will agree with each other. 
The author and Norton (19) tested 
five samples of 24 subjects each in 
discussions of two, four, six, eight, 
and twelve in size. Maximum agree- 
ment among observers was reached 
(r=.89) and was 
lowest in groups of two (r=.72). 
Carter, Haythorn, Meirowitz, and 
Lanzetta (31) found that mean ob- 
server agreement was only .70 for 
four-man groups, while it was .85 
when the same men were retested in 
eight-man groups. 

Effects of test or retest. ‘The last 
results cited can also be accounted 
for, in part, by the fact that observer 
agreement appears to increase when 
the same subjects are retested. Thus, 
when the 120 subjects of the author 
and Norton were all 
server agreement increased for groups 
of all sizes from a mean correlation 
of .82 to a mean correlation of .90. 

[Effects of the object of rating. The 
data summarized in Table 1 sug- 
gest that rater agreement is higher 
where raters employ a check list in 
which they merely indicate the extent 
to which each of a number of items 
of leader behavior was exhibited by 
each candidate, rather than where 
they employ a single graphic rating, 
or where they attempt to make in- 
ferences about the standing of the 
examinees on intervening variables 


in groups of six 


retested, ob- 
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of personality or ability supposedly 
underlying the LGD behavior being 
The median estimated 
reliability of check list ratings is .90; 
the median estimated reliability of 
other types of ratings is .80. 


observed. 


Test-Retest Reliability 


The available test-retest 
reliability coefficients and the studies 
on which they are based are listed 
in Table 2 in order of the size of the 
coefficients. Listing them by size 
leads to the inference that test-retest 
reliability is higher the more similar 
the test and retest situations. The 
consistency of LGD behavior is high- 
er the less group membership changes 
from test to retest; the less the prob- 
lem changes; 


_seven 


the less some members 
are increased in ability to lead; the 
less some members are increased in 
“real-life”’ 
are changed; and the less time be- 
tween permitting 
more random or biasing change to 
These 


status; the less observers 
tests 


increases, 


occur among participants. 


results conform to the principle of 
consistency, proposed by the OSS 
Assessment Staff, that a subject will 
respond to similar environmental con- 
ditions in a similar manner (59). 
Where changes in situation from 
test to retest are reduced to a mini- 


reliability 
It is probable that where 
behavior check lists describing par- 
ticipant behavior are used, the true 
test-retest reliability of the LGD is 
somewhere between .75 and .90. 
effects of size. The number of 
participants in a group appears to 
determine the consistency of the be- 
havior from test to retest. For, while 
the author and Norton (19) found 
groups from four to twelve in size to 
have an average test-retest reliabil- 
ity of above .90, groups of two had 
a corresponding reliability of only 


mum, a high test-retest 


is found. 





mixed college stu- 
dents (19) 

mixed college stu- 
dents* 


sorority members 


(23) 


mixed college 
dents (24) 


mixed college 
dents (6) 


mixed college 
dents (21) 


7. 172 ROTC cadets 


8 36 male college stu- 


dents (30) 


* Unpublished data. 
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TABLE 2 


Test-RETEST RELIABILITY OF RATINGS OF LEADERSHIP DISPLAYED IN LEADERLESS 
Group Discussions RANKED IN ORDER OF SIMILARITY OF TEST AND RETEST 


Rating Method 


Check list 
(9-item) 
Check list 
(9-item) 


Check list 


(7-item) 


Participants 
rank each 
other 

Checklist 

(13-item) 


Check list 
(7-item) 


Check list 
(9-item) 


Rating scale 


These results suggest that two- 


Differences between | 


Test 
Situations 


| None 


|} One “important 
exam,” other 
not 
| Two members of 
each group of 7 
given _ training 
between tests 
Groups rearranged, 
six retests 


Intervening LGD | 


among leader or 

followers only 

before second 
test 

| Group rearranged, 
type of problem 
systematically 
varied: case his- 
tory, no problem 
presented, and 


leader specifica- | 


tion to be 
lined 
Groups 


out- 


rearrang- 
ed; some sub- 
jects changed in 
“real-life” status 


more than others | 


between 
different raters 

Six intervening 
situational tests 
in groups of two 
before second 
test 





Interval 
between 
Tests 


Correla- 
tions 





| Week 








tests; | 


Week 


Three 
hours 


Week 





Six 
weeks 


} 

Two days 
to week | 

| 


| 
| 
| 


| Three to | 


four 


months | 
| 


.90 





will be examined in this discussion. 


man and probably three-man leader- 
less group discussions should not be 
used for assessment purposes. 


VALIDITY OF THE LEADERLESS 
Group DISCUSSION 


The construct validity of the LGD 


This requires both logical and empiri- 
cal review. 

Figure 1 diagrams the relation- 
ships among a group of variables of 
importance in the study of leader- 
ship. Using a set of postulates based 
primarily on learning theory, the 
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¥ MEASUPABLE VARIABLE. C9 


Fic. 1. RELATIONS AMONG LEADERSHIP 
VARIABLES 


author elsewhere (13) has deduced 
these relationships, which may be 
summarized as follows: 


1. The more a member ts able to solve the 
group's problems because of personal charac- 
teristics (such as his capacity, achievement, 
responsibility, and participation), the more 
likely he is to exhibit successful leadership be- 
havior in real life, in quasi-real situations, and 
in the leaderless group discussion. These per- 
sonal characteristics are reflected in perform 
ance on various psychological tests of intelli- 
gence, proficiency, and personality. 

2. The more a member exhibits successful 
leadership, the higher is his esteem among his 
associates—the extent to which he is regarded 
of worth as a member or leader to the group, 
regardless of his position—and the higher will 
be the merit ratings he receives as a successful 
leader or member. The higher his esteem, the 
more likely he is to be of further success as a 
leader among his associates. 

3. The higher a member’s status, as inferred 
from his rank or the worth of his position 
among his associates, the more likely he is to 
successfully lead his associates. 


Further relations between vari- 
ables noted in Fig. 1 can be ignored 
here, 

If ratings of LGD performance 
are actually valid measures of tend- 
encies of individuals to differ in 
successful behavior, and if we accept 
as a logical rule that variables with 
common determinants should cor- 
relate positively with each other, 
then following the outline of Fig. 1, 
LGD scores should correlate: 
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1. With real-life rank (aud hence real-life 
status) when the LGD is among associates of 
different ranks; 

2. With real-life merit ratings (hence real- 
life esteem); 

3. With leadership performance in other 
quasi-real situations; 

4. With observations and indices of success- 
ful leadership performance in real-life situa- 
tions: 

5. With personal characteristics as meas- 
ured by psychological tests and measurements 
commonly associated with success as a leader, 


Any positive correlation between 
an LGD rating and these other speci- 
hed measures should provide partial 
evidence of validity of the LGD per- 
formance rating—namely that it 
actually measures leadership poten- 
tential or individual differences in 
tendency to be successful as a leader. 
Previously presented evidence indi- 
cates that the measurement of LGD 
performance is consistent with itself. 
The question still to be answered is 
whether or not it is consistent with 
the various other measurements as- 
sociated with, described as, or de- 
fined successful leadership — be- 
havior. Rated LGD performance 
should be associated with these other 
measures if it is an assessment of 
success as a leader. 

We will now survey empirical in- 
vestigations of the extent to which 
rated LGD performance was found 
associated with status and esteem in 
real-life, personal characteristics and 
leadership performance elsewhere. 


as 


Status (as Estimated by Rank) and 
LGD Performance 


A biserial correlation of .88 was 
found between the rank in the com- 
pany of each of 131 oil refinery super- 
visors and their success as LGD 
leaders among their associates (22). 
The more the discussion problem con- 
cerned matters for which they had 
rank over their associates, the higher 
was this correlation (21). A corre- 
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sponding correlation of .51 was found 
for 264 ROTC cadets.? The lower 
correlation in ROTC probably re- 
flected the fact that rank differences 
were less vital to the cadets than to 
the industrial executives. 

When 180 ROTC cadets were re- 
tested among their associates a year 
after an original test, those who rose 
during the year from cadet noncom 
to cadet first lieutenant or higher 
gained significantly more in LGD 


score on the retest compared to the 
test than those who received promo- 
tions to cadet second lieutenant only.® 


Esteem as Estimated by Real-Life 
Merit Ratings and LGD Per- 
formance 


Table 3 lists 17 correlations be- 
tween LGD performance and esteem- 
in-real-life as estimated by merit 
ratings. It also shows when and how 
the ratings of merit were obtained. 
The median correlation is .39 and 
is raised to .51 when only the seven 
cases in which correction was made 
for the unreliability of esteem rat- 
ings are considered. As shown in 
Table 3, LGD scores have been found 
moderately predictive of merit as 
an ROTC cadet officer (15), sorority 
or fraternity member (20, 23, 72), 
civil service administrator (3, 69), 
shipyard foreman (55), foreign serv- 
ice administrator (69), and OCS 
cadet officer (71). The moderate cor- 
relations between LGD scores and 
real-life esteem for the studies that 
involved discussions among strangers 
suggest that a common source of 
variance among examinees, which 
exists beyond the effects of situation, 
underlies an examinee’s merit among 
his real-life associates and his success 


7 Bass, B. M., & Coates, C. H. Situa- 
tional and personality factors in leadership in 
ROTC. Unpublished manuscript. 

8 See footnote 7. 
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as a leader among strangers. The 
relationship between esteem and 
LGD score cannot be attributed sole- 
ly to the tendency of an examinee to 
display successful leadership among 
associates who esteem him. Thus, 
the LGD appears to assess attributes 
of the examinee that are not specific 
either to the test situation or to the 
group in which he is tested. 
Variables related to the correlation 
between esteem and LGD performance. 
The total amount of successful lead- 
ership displayed in a leaderless dis- 
cussion appears to reflect the average 
merit as leaders elsewhere of all the 
participants of the discussion. An 
analysis of 67 LGD’s among frater- 
nity pledges, sorority members, and 


® Another interpretation of these findings 
has been offered by E. L. Kelly in the 1954 
Annual Review of Psychology (Stanford, Calif.: 
Annual Reviews, Inc., p. 295). Kelly suggests 
that to some extent, the real-life merit raters 
may react in the same way to the same cues 
irrelevant to leadership success, as do the 
LGD raters. Both assessments are in agree- 
ment, but the source of agreement does not 
necessarily concern leadership potential. 
Thus, in a given situation, real-life merit 
raters and LGD raters both may tend to as- 
sign high ratings to thin men, and logic and 
the literature on the subject suggest that 
thinness should have no relation to leadership 
potential. 

A counterargument is as follows: The real- 
life merit ratings, biased as they are, are a 
function of the extent to which the raters 
value or esteem the ratees. The evaluation 
tends to have consequences affecting the con- 
tinuing success of the performance under 
evaluation. If seemingly irrelevant cues such 
as thinness influence merit ratings in real 
life, they will then tend to be associated with 
esteem and leadership potential. In reacting 
to the same biasing cues, the LGD observers 
are in error as far as logic or psychologists are 
concerned, but, despite this, their error is asso- 
ciated with real-life leadership potential as 
well as with the biased real-life merit ratings. 

This same counterargument will not apply 
where merit ratings have no future conse- 
quences on the success of the actual perform- 
ance being rated, such as in the case of ap- 
praising diagnostic proficiency of physicians. 
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ROTC cadets found a correlation of 
.35 between the mean LGD score per 
discussion group and the mean es- 
teem-in-real-life score of the partici- 
pants of each group. ‘Thus, there 
was a between-groups as well as a 
within-groups positive correlation be- 
tween LGD ratings and esteem as 
estimated by merit ratings. In the 
same way, the LGD scores within a 
discussion .20 with the 
merit scores among the participants 
of that discussion (17). 

These results suggest that LGD 
ratings which depend solely on stand- 
ards within a group discussion should 
suffer in validity as estimated by cor- 
relations with real-life merit ratings. 
Rating techniques with this disad- 
vantage include the forced distribu- 
tion rating, paired comparison, rank- 
ing, and any others which force the 
equalization of all discussion-group 
LGD score means and/or variances. 
Similarly, when tested among strang- 
ers, any ratings of each other by the 
participants themselves will be at- 
tenuated in validity as predictions of 
esteem, since they will depend solely 
upon standards based on observation 
of a single discussion. 

This same analysis (17) indicated 
that a number of variables are associ- 
ated with the variation from discus- 
sion to discussion in the correlation be- 
tween real-life esteem of the members 
and their LGD performance. Accord- 
ing to a Doolittle solution, these 
variables included: within-discussion 
variance in real-life esteem; within- 
discussion variance in LGD ratings; 
and group size. All relationships were 
positive except for size. Six-man 
groups were slightly more valid than 
larger ones as predictors of real-life 
esteem. 

Status differences among members 
almost completely invalidate the 
LGD as an indicator of real-life es- 
teem. Yet above and beyond these 


correlated 
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effects, the case history discussion 
appears more likely than other types 
of discussions to reflect differences 
in real-life esteem and personality. 
Where members of different rank 
were tested together, the case history 
discussion was the only one which 
yielded scores that correlated posi- 
tively with merit ratings as refinery 
supervisors (r=.28). Furthermore, 
case history discussion performance 
correlated .54 with a supervisory 
aptitude test battery. Other types of 
discussion, in these circumstances, 
averaged .23 in correlation with 
supervisory aptitude test scores, and 
correlated negatively with ratings of 
esteem of these supervisors of differ- 
ent rank (21). 


Personal Characteristics Associated 
with Leadership and LGD Per- 
formance 


According to Stogdill’s survey of 
over one hundred studies, leaders 
tend to surpass nonleaders in certain 
personal characteristics such as ca pac- 
ity (intelligence, alertness, verbal 
facility, originality, and judgment), 
achievement (scholarship and knowl- 
edge), responsibility and associated 
personality factors (dependability, ini- 
tiative, persistence, aggressiveness, 
self-confidence, desire to excel), and 
participation (activity, sociability, 
cooperation, and adaptability) (63). 
If these characteristics can be in- 
cluded under the broad concept of 
“abilities to solve group problems,”’ 
then their relationship to leadership 
can be deduced as well (13). Perform- 
ance in the LGD should be associ- 
ated with these personal factors, if it 
is to be judged valid as a measure of 
individual differences in tendency to 
exhibit successful leadership _ be- 
havior. Table 4 shows the correla- 
tions obtained between various meas- 
ures of capacity and/or achievement 
and LGD performance. Table 5 
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TABLE 4 


CORRELATION BETWEEN LGD PERFORMANCE AND MEASURES OF CAPACITY 


Subjects 





AND ACHIEVEMENT 


Test of Capacity or 
Achievement 
| 


sorority members (23) | ACE Linguistic 


fraternity pledges (72) | 


sales and management 
trainee candidates 
(10) 

sorority members (23) 

fraternity pledges (72) 

ROTC cadets* 


administrator candi- 


dates 68) 
oil refinery 
sors (22) 


supervi 


foreign service candi- 
dates (69) 
administrator 
dates (69) 
foreign service candi- 
dates (69) 
administrator 
dates (69) 
sorority members 
oil 


candi- 


candi- 


(23) 
refinery supervi- 
sors (22) 


ROTC cadets 
sorority members (23) 


refinery 
sors (22) 
mixed college students 
(64) 


oil supervi- 


pet 


ACE Linguistic 


| OSPE 


ACE Quantitative 
ACE Quantitative 
AC Ee lotal 


South 
Force test of men 
tal alertness 

Otis Gamma 


\frican 


| Cognitive Test Bat 
tery 

| Cognitive Test Bat- 
tery 

Civil Service Qualify- 
ing Exam 

Civil Service Qualify- 
ing Exam 

Years of education 


Years of education 


Average grade in col- 
lege 

Average grade in col- 
lege 

Supervisory aptitude 
test 

How Supervise? 


ween Ta 


stant 


Air | 


| 


Factor(s) Most 
Probably Measured 
by Test or 
Measurement 
Verbal aptitude 
Verbal aptitude 
Verbal aptitude 


, 


Numerical aptitude 

Numerical aptitude 

Verbal and numerical 
aptitude (intelli- 
gence) 

Intelligence 


Verbal, spatial, numer- 
ical, aptitude (intel- 
ligenc e) 

Intelligence 


Intelligence 


Intelligence and scho- 
lastic achievement 
Intelligence and scho- 
lastic achievement 


Scholastic achievement 


| 
| 
| 
| 


Intelligence and scho- 
lastic achievement 
Scholastic achievement 


Corre- 
lation 


so 
oz 
.25 





Scholastic achievement | 


Supervisory aptitude 


Supervisory knowledge | 


shows the correlations between LGD 
performance and various personality 
variables that approximate the ‘‘re- 
sponsibility’ and ‘‘participation”’ 
clusters of Stogdill. The first three 
items of Table 7 may be regarded as 
further evidence of the correlation 
between participation and LGD per- 
formance. We shall briefly consider, 
in turn, the correlations between 


LGD performance and capacity, 
achievement, responsibility, and par- 
ticipation. 

Capacity and leaderless group dis- 
cussion performance. While the cor- 
relation between LGD performance 
and verbal aptitude averages .30 
or above, as might be expected, the 
LGD behavior 
and numerical aptitude is below .20. 


correlation between 
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Intelligence test scores measuring 
verbal, numerical, and spatial fac- 
tors tend to fall in between in cor- 
relation with LGD performance. 
Ability to solve the leaderless dis- 
cussion group’s problems appears to 
depend more on verbal than on other 
aptitude factors. 

A factor analysis of 14 leadership 
and ability measures by the author 
and Coates!® found that while rated 
performance in initially leaderless dis- 
cussions correlated close to zero with 
one factor, Ability in Active Situa- 
tions, it correlated .44 with another 
factor, Ability in Verbal Situations. 
Sakoda’s (62) analysis of OSS data 
also noted that discussion ratings 
fell into a cluster with other ratings 
based on “‘verbal”’ situations. Car- 
ter, Haythorn, and Howell (30) ar- 
rived at the same conclusion follow- 
ing factorial analyses of several cri- 
teria of leadership. These findings 
may indicate the boundaries to the 


range of real-life situations in which 
leadership behavior can be forecast 
by rated LGD performance. 

The 10 available correlations be- 


tween intelligence and associated 
aptitude test scores and success in 
the LGD appear consistent with ex- 
pectations. Yet, verbal aptitude only 
accounts for 10 to 15 per cent of the 
variance in LGD, and so cannot be 
regarded as a more easily adminis- 
tered substitute for predicting success 
as a leader. 

Achievement and leaderless group 
discussion performance. Six studies 
are available of the correlation be- 
tween LGD ratings and _ tested 
achievement, years of education, or 
grades in college. Again, the correla- 
tions are uniformly positive, ranging 
from .16 to .31, with a median of .25. 
If the unusually high correlation of 
.57 is ignored because it is contami- 


10 See footnote 7. 
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nated by the correlation of rank with 
both education and LGD success, the 


_median becomes .20. 


Above and beyond the effects of 
rank, a correlation of .30 appears to 
exist between LGD performance and 
supervisory aptitude as measured by 
an optimally weighted battery of 
interest, biographical, and super- 
visory judgment tests. 

How Supervise?, a fairly widely 
used test of knowledge of principles of 
supervision, which aims to predict 
success as an industrial supervisor, 
correlates .46 with the LGD success 
of 66 college students (64). 

Personality and leaderless group dis- 
cussion performance. We shall now 
consider correlations obtained be- 
tween LGD performance and the 
specific personality traits found to be 
associated with leadership according 
to Stogdill. 

Stogdill cites five studies in which 
leaders are found to be more energetic 
than nonleaders. For the LGD, a 
correlation of .15 was found between 
general activity and energy as as- 
sessed by the Guilford-Zimmerman 
Temperament Survey for 76 sorority 
girls. A corresponding correlation of 
.12 was found for 66 college students 
(64). Also, LGD leaders were more 
often characterized in a Rorschach 
analysis as highly energetic, while 
LGD nonleaders more often were de- 
scribed as lazy or passive (18). 

Stogdill uncovered a large number 
of studies that found successful 
leadership associated with original- 
ity, soundness of judgment, and abil- 
ity to evaluate situations. Rorschach 
analysis characterized LGD leaders 
as strongly imaginative, strongly 
interested in details, and able to see 
the larger aspects of things, and LGD 
nonleaders as stereotyped or conven- 
tional in thoughts and perceptions, 
as unclear, plodding, and confused 
thinkers, and as unimaginative. A 
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study of 172 ROTC cadets found an 
eta of .30 between the F scale of 
authoritarianism and LGD perform- 
ance. Highly authoritarian—hence 
stereotyped and rigid—personalities 
did extremely poorly in the discus- 
sion, while equalitarian—but not too 
equalitarian—cadets earned the high- 
est LGD scores. Among samples of 
100 and 67 of these same cadets, 
Pearson correlations of .32 and .33 
were found between LGD perform- 
ance and Thurstone’s Concealed 
Figures and Gestalt Completion tests 
—tests of perceptual flexibility.” 
Self-assurance and absence of 
modesty were uniformly associated 
with leadership in 17 studies cited 
by Stogdill. Similarly, self-esteem 
as measured by 140 self-nominations 
for sorority leadership positions cor- 
related .29 with LGD performance. 
An analysis of interviews with nine 


LGD leaders and nine LGD non- 


leaders from a total of 140 subjects 
using the Who-Are-You 
(28) showed that compared with non- 
leaders, LGD leaders more frequently 
regard in a more favorable light them- 
selves, their effects on others, and 
other persons’ effects on them. 


technique 


Discrepancies between Stogdill’s 
conclusions concerning leadership in 
general and LGD personality corre- 
lations arise when we consider per- 
sonality characteristics such as emo- 
tional stability, sociability, ascend- 
ency, and responsibility. 

Eleven studies have found emo- 
tional stability to be associated with 
leadership, although the evidence is 
not uniformly positive (63).  Simi- 
larly, LGD performance correlated 
.20 and .17 with emotional stability 
as measured by the Guilford-Zimmer- 
man inventory (18, 64). Rorschach 
analysis likewise characterized more 
LGD leaders than nonleaders as 


1! See footnote 7. 
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emotionally stable. Overt social ad- 
justment based on peer ratings cor- 
related .28 with LGD performance. 
On the other hand, five correlations 
found between LGD performance and 
the inventoried traits of freedom 
from hypersensitivity and_ hostility 
ranged from —.40 to .29 with a medi- 
an of —.04 (18, 64). 

Responsibility was uniformly re- 
ported by Stogdill to be associated 
with leadership. But a correlation 
of —.29 was found for 47 ROTC 
cadets between LGD ratings and re- 
sponsibility as measured by the 
Gordon Personal Profile, while cor- 
relations of .19 and .26 were found 
in two analyses of the relations be- 
tween thoughtfulness as assessed by 
the Guilford-Zimmerman and LGD 
performance (18, 64). 

Evidence concerning the relation 
between extroversion, ascendency, 
and leadership in general is contra- 
dictory according to Stogdill (63). 
Ascendency as measured by the 
Guilford-Zimmerman Temperament 
Survey correlated .44 with LGD 
performance of 76 sorority girls (18) 
and .25 with LGD performance of 
mixed college students (64); but as- 
cendency as measured by the Gordon 
Personal Profile and the A-S Reaction 
Test correlated only .02 and —.02, 
respectively, with LGD performance. 

Sociability as measured by the 
Guilford-Zimmerman in two studies 
(18, 64) correlated .27 and .31, re- 
spectively, with LGD performance; 
similar correlations between cooper- 
ativeness and LGD success were .14 
and .13, while sociability as measured 
by the Gordon Personal Profile cor- 
related close to zero with LGD per- 
formance. 

Finally, while Stogdill found ambi- 
tion and desire to excel important 
attributes of leadership, an average 
correlation of —.05 was found be- 
tweeu LGD performance and ex- 
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CORRELATION BETWEEN LGD PERFORMANCE AND PERSON 


Subjects 


sorority members (18) 

mixed college students 
(64) 

sorority members 


highest and lowest 
on LGD (18) 


ROTC cadets (15) 
ROTC cadets* 


ROTC cadets* 
sorority members (23) 


sorority members 


highest and lowest | 


on LGD (18) 
sorority members (18) 


mixed college students 


(64) 


sorority members (23) 
sorority members (18) 


mixed college students | 


(64) 
ROTC cadets* 


sorority members (18) 


mixed college students | 


(64) 


sorority members (18) 


mixed college students 
(64) 


sorority members (18) 


| Guilford-Zimmerman 


BERNARD M., BASS 


TABLE 5 


Personality Test or | 
Measurement 


Temperament Sur- | 
vey—G 
Guilford-Zimmerman 
Temperament Sur- 
vey—G 
Rorschach 


UCPOC F scale 


Concealed Figures 

Gestalt Completion 

Self-nominations for 
leadership positions 
in sorority 


W-A-Y interview 


Guilford-Zimmerman 
Temperament Sur- 
vey—E 

Guilford-Zimmerman 
Temperament Sur- 
vey—E 

Peer ratings 

Guilford-Zimmerman 
Temperament Sur- 
vey—F 

Guilford-Zimmerman 
Temperament Sur- 
vey—F 

Gordon Personal Pro- 


file—H 


| Guilford-Zimmerman 


Temperament Sur- 
vey—O 
Guilford-Zimmerman 


Temperament Sur- | 


vey—O 


| Guilford-Zimmerman 


Temperament Sur- 
vey—P 


| Guilford-Zimmerman 


Temperament Sur- 
vey——P 

Guilford-Zimmerman 
Temperament Sur- 
vey—S 


| General activity and 


Authoritarianism, 


ALITY TESTS OR MEASUREMENTS 


Corre- 
lation 


BR 


Trait Probably 
Measured 


energy 


General activity and 12 
energy 


Imagination, strongly 
interested in details, 
able to see larger as- 
pects of things vs. 
conventionality, ster- 
eotypy, etc. 





ri- | 
gidity 

Perceptual flexibility 

Perceptual flexibility 

Self-esteem 


Self-esteem 


Emotional stability 


Emotional stability 


Overt socialadjustment 
Friendliness, agreeable- | 
ness 


Friendliness, agreeable- 
ness 

Freedom from hyper- 
sensitivity 

Freedom from hyper- | 


sensitivity 
Freedom from hyper- 
sensitivity 
Cooperativeness 


Cooperativeness 


Sociability 
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Subjects 





Mixed college 
dents (64) 


ROTC cadets* 
ROTC cadets* 


sorority members (18) 


mixed college students 
(64) 


sorority members (18) 


mixed college students 


(04) 
ROTC cadets* 


mixed college students 
(64) 


sorority members (23 


uncoached college stu 
dents* 
coached 
dents* 
uncoached college 
dents* 
coached 
dents* 
uncoached college stu- 
dents* 
coached 
dents* 
coached 
dents* 
coached 
dents* 
coached 
dents* 


college 
tu- 
college stu- 
college stu- 
college stu- 
stu- 


college 


college stu- 


sorority members (18) 


mixed college students 


* Unpublished data 


t+ Curvilinear relationship. 
Highly authoritarian subjects perform worst 


subjects. 





stu- 


| 


Personality Test or 
Measurement 


Guilford-Zimmerman 
Temperament Sur- 
vey—S 

Gordon Personal Pro- 


file S 


| Gordon Personal Pro- 


file—kR 
Guilford-Zimmerman 
Temperament Sur- 
Be i 
Guilford-Zimmerman 
Temperament Sur- 
vey—l 
Guilford-Zimmerman 
Temperament Sur- 
vey \ 
Guilford-Zimmerman 
lemperament Sur- 
vey—A 
Gordon P 
file \ 
A-S Reaction Test 


‘rsonal Pro- 


of motiva- 


tion for various of- 


Ratings 


fices 


Check list descr iptions 
Check list descriptions 
Check list descriptions 
Check list de criptions 
Kerr Empathy Test 
Kerr Empathy Test 


Dymond Empathy 
lest (modified) 
Dymond Empathy 
Test (modified) 
Dymond Empathy 
Test (modified) 


Guilford-Zimmerman 
lemperament Sur- 

M 

Guilford-Zimmerman 


vey 
Temperament Sur- 


M 


on LGD 


| 
| 


| Sociability 





Maximum LGD success experienced by equalitarian 


TABLE 5—Continued 





Corre- 
lation 


Trait Probably 
Measured 





31 


Sociability 07 


Responsibility 29 


Thoughtfulness, re- 
flectiveness 


‘houghtfulness, re- 
flectiveness 


Ascendency 
Ascendency 


\scendency 
Ase endency 


Motivation to lead 


Parental initiation 
Parental initiation 
Parental consideration 
Parental consideration 
Social knowledge 
Social knowledge 
Accuracy of estimated 
self-ratings of others 
Accuracy of estimated 
group ratings of self 


Accuracy of estimated 


group ratings of 


others 
Masculinity-femininity 


Masculinity-femininity 


but not too equalitarian— 


t Chi-square analysis suggested that a significant positive correlation existed between the self-esteem of inter- 
viewees and their LGD scores. 
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pressed desire to hold sorority or 
university student offices (23). 

It may be inferred that some con- 
sistency exists between Stogdill’s 
generalizations of the relations be- 
tween leadership in general and such 
personality traits as energy, flexibil- 
ity of judgment, and self-esteem, and 
the relations between these traits and 
LGD performance. Contradiction or 
lack of uniformity appears when we 
consider such traits as responsibility, 
emotional stability, ascendency, and 
sociability." 

Participation and LGD  perform- 
ance. According to Stogdill, leaders, 


1 Part of this lack of uniformity may be 
due to variations among techniques used to 
measure the various personality traits, and 
variations in the sex composition of the sam- 
ples studied. 

The responsibility, ascendency, sociability, 
and freedom from hypersensitivity scales of 
the Gordon Personal Profile, a forced-choice 
personality inventory, administered to sam- 
ples of men only, tended to correlate zero or 
negatively with LGD success. Corresponding 
Guilford-Zimmerman scales with the same or 
similar names—of the traditional self-report 
type—tended to correlate positively with the 
LGD success of a sample of women only (18) 
and with the LGD success of mixed men and 
women (64). 

Assuming that Gordon's forced-choice pro- 
cedure is less subject to distortion than the 
Guilford-Zimmerman, and assuming that the 
measurement techniques rather than the sam- 
ples were a significant source of variance, one 
might speculate that LGD performance tends 
to be associated with a participant's concept 
of himself, but only where the participant is 
free to distort the description to suit himself. 

To go one step further, it may be that LGD 
success is related to the way a participant 
likes to see himself, but not to the way he 
actually sees himself when forced to make dis- 
criminations over which he has less control. 

Neither of these self-inventoried evaluations 
actually meets the requirements of the orig- 
inal hypothesis that the more able member is 
more likely to be esteemed and to be success- 
ful as a leader. The crucial evaluations of 
ability for leadership and esteem are those 
based not on self-evaluation, but on other 
group members’ judgments of the participant, 
or on objective tests of personality (as con- 
trasted with inventories). 
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in general, tend to be more talkative, 
more industrious, and more likely to 
participate in group activities. The 
same appears to be true of high 
scorers on the LGD who, compared 
with low scorers, were found to make 
1.5 times as many responses in a 
personal interview, and to give 1.6 
as many responses to the Rorschach 
(18). LGD success also has been 
found, for 140 sorority members, to 
correlate .36 with the number of 
university student leadership posi- 
tions held per semester, and .10 
with the number of sorority positions 
held per semester (23). 

Parental leadership and LGD per- 
formance. It is expected that child- 
hood’ experiences and memories of 
them should play a significant role 
in adolescent or adult leadership be- 
havior (13). In an unpublished study, 
the author hypothesized that LGD 
performance would be a function of 
the participants’ perception of how 


they had been led by their parents. 
Examinees used modified Ohio State 
Leadership Studies behavior check 
lists (38) to describe the extent to 
which their mothers and fathers initi- 
ated structure for them and were con- 


siderate of them. The average and 
range of correlations between such 
descriptions and LGD performance 
suggested that no consistent rela- 
tionship existed between parental 
descriptions and LGD performance. 

Empathy and LGD performance. 
A series of studies (e.g., 33, 67) has 
found a moderate relationship be- 
tween empathic ability and leader- 
ship. The results have not been uni- 
formly positive, mainly because the 
measure of empathy has varied great- 
ly from one investigation to another. 
Elsewhere (13), the author has de- 
duced that the more a person can 
accurately estimate the meeds of 
others, the more likely he is to suc- 
cessfully lead others. Chowdry and 
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Newcomb (33) have come closest to 
testing this hypothesis, and have 
obtained positive results. 

Kerr and Dymond Empathy test 
data collected on a small sample 
(N’s=22 to 39) of students by 
Stolper (64) were correlated by the 
author with LGD performance 
scores, as shown in Table 5. The cor- 
relation of .36 between accuracy of es- 
timations of group ratings of self on the 
Dymond test and LGD performance 
was somewhat artifactual since LGD 
leaders probably led in the Dymond 
test also, and all members of a group 
are likely to agree more closely on 
leaders’ ratings compared to non- 
leaders, according to an earlier study 
by the author (6). 


Leadership Performance in Other 
Quasi-Real Situations and LGD 
Performance 


According to Fig. 1, leadership in 


other quasi-real situations is governed 
by the same individual factors as 
performance in the LGD. Therefore, 
the two should correlate fairly highly 
with each other if successful leader- 
ship is being measured in these other 
situations, and if ratings of LGD per- 
formance are valid as leadership rat- 
ings. Similar deductions can be 
made about LGD performance and 
success as a leader in real life." 

Table 6 summarizes the 17 avail- 
able correlations between leadership 
or ‘‘desirability for leadership posi- 
tions” ratings based on LGD’s and 
other situational tests. The correla- 
tions are almost uniformly highly 
positive, but many are contaminated 
because the same raters were used to 
assess candidates during the LGD 
and the other situations. 

The median correlation between 
interview and LGD ratings is .70. 

13 The same alternative interpretation and 


rebuttal apply here as are presented in foot- 
note 6. 
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Where ratings are made by different 
raters in each situation, the correla- 
tion drops to .45 (9). 

Correlations of .37 and .67 were 
found between ratings based on as- 
signed leadership and leaderless dis- 
cussion (3, 59). Correlations between 
the LGD and other verbal situational 
tests, such as debates and commit- 
tee work, are almost as high as be- 
tween the LGD and its retest, aver- 
aging .70. As the other situations 
involve less discussion and more me- 
chanical or athletic activity, the me- 
dian correlation between the LGD 
and these other situational tests is 
reduced to around .52., 

Leadership ratings based on the 
leaderless group discussion alone 
correlated .64 with the final leader- 
ship ratings based on the entire OSS 
battery of situational tests (59). 


Leadership Performance in Real-Life 
Situations and LGD Performance 


Table 7 summarizes the correla- 
tions between LGD performance and 
real-life leadership performance." 
While LGD ratings correlate some- 
what with the tendency to hold lead- 
ership offices, they appear to be 
associated more with the tendency in 
real life to initiate structure and inter- 
action among associates and sub- 
ordinates (r=.32). On the contrary, 
a low negative correlation exists be- 
tween LGD performance and the 
tendency in real life to be considerate 


‘4 An attempt has been made in this article 
to treat separately from the more numerous 
studies of LGD performance correlated with 
merit ratings of real-life performance (Table 
3), studies of the LGD correlated with objec- 
tive indices of real-life leadership performance 
or fairly nonevaluative descriptions of real-life 
leadership performance. While usually em- 
pirically related, merit rating of performance 
is considered conceptually independent of 
actual performance or descriptions of per- 
formance. 
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TABLE 6 


CORRELATION BETWEEN I?ERFORMANCE IN OTHER QUASI-REAL SITUATIONS 
AND IN THE LGD 


| cine ie | 
Other Quasi-Real | 
Situational Tests 


Corre- 


Subjects : 
J lation 


Trait Measured 


1. 223-442 OSS applicants (59) | Interview | Leadership 48 
2. 123 foreign service candi- | Interview Desirability for job |  .75 
dates (69) 
3. 202 administrator candi- | Interview Desirability for job 
dates (69) 
64 sales management Interview with dif- | Desirability for job 
trainees (9) ferent raters | 
14 shoe factory trainee | Interview and tests Desirability for job 
executive candi- 
dates (65) 
. 223-442 OSS applicants (59) Assigned leadership | Leadership 
~ ae administrator candi- | Assigned leadership “Personality and 
dates (3) ability” 
4. 223 OSS applicants (59) | Debate Leadership 
. 223-442 OSS applicants (59) | Brook Leadership 
. 223-442 OSS applicants (59) Construction | Leadership 
11. 40 NROTC subjects (31) | Leaderless Mechani- | Leadership 
cal Assembly—4- 
man 
Leaderless Mechani- | Leadership 
cal Assembly—8- | 
man 
Leaderless Group | Leadership 
Reasoning Task— 
4-man 


{ 
| 
| 
| 


Reasoning Task 
8-man 
Other situational Desirability for job 
tests such as com- | 
mittee work 
Other situational | Desirability for job 


t2.. 320 foreign service candi- 
dates (69) 


13. 202 administrator candi- 


dates (69) tests such as com- | 


} 


| 
| Leaderless Group Leadership 
| 


mittee work 











of the welfare of subordinates and ROTC cadets.* Table 8 lists the 
associates (r= —.25).' loadings of the LGD test and retest 
: : , f on the factors that accounted for 
Studies of Factors Associated with most of the variance of the LGD. 
LGD Performance In line with propositions stated 
Two factorial studies analyzing earlier, ratings based on LGD per- 
LGD test and retest scores in a cor- formance appear to assess the extent 
relation matrix of real-life, situational to which an individual initiates 
test, and psychological test meas- structure or is socially bold in am- 
ures are available. The first (23) biguous situations. Esteem, or per- 
was based on 41 measurements of sonal worth, and verbal ability are 
140 sorority girls; the second con- also involved. On the basis of these 
cerned 14 measures made of 66 to 244 studies it may be inferred that there 


16 See footnote 7. 16 See footnote 7. 





THE LEADERLESS GROUP DISCUSSION 


TABLE 7 


CORRELATION BETWEEN LGD PERFORMANCE AND REAL-LIFE BEHAVIOR 








Subjects 





Method of 
Measurement 


‘ earns - ee 
Real-Life Behavior | Correla- 
Measured | tion 





. 140 sorority girls (23) 


. 140 Number of 
leadership 


| held 


sorority girls (23) 





140 sorority girls (23) 


| 


. 140 sorority girls (23) 


5. 180 ROTE cadets* 


. 133 ROTC cadet officers* 


. 133 ROTC cadet officers* 


sales trainee candi- | Number of 


dates (9) 





ously 





* Unpublished data. 
¢t Eta (curvilinear relationship). 


t Significant difference at 15 per cent level between LGD score of experienced leaders and those 


perience as leaders. 


exist three independent sources of 
variance underlying leaderless group 
discussion performance: 
I. Tendency to Initiate Structure 
II. Tendency to Be Esteemed or 
of Value to a Group 
III. Ability in Verbal Situations. 
Little specific variance is left when 
the common variance accounted for 
by these factors is extracted. 


Extent of extracurricu- 
lar activities 
university 


| Number of sorority lead- 
ership positions held 

Extent peers know her 
well enough to rate 

Final cadet rank achieved | 


Ratings of subordinates 
and peer associates 
Ratings of subordinates 
and peer associates 


leadership 
positions held previ- 


Participation in social 32 
groups 

Real-life leadership per- 
formance 


.36 





positions 


Real-life leadership per- 
formance 

Visibility among associ- 
ates 

Future success as a lead- 
er 

Degree of ‘‘considera- 
tion of subordinates” 

Degree of initiation of 
structure and interac- 
tion among associates 
and subordinates 

Real-life leadership per- 
formance 








without ex- 


SUMMARY 

The history, applicability, relia- 
bility, and validity of the leaderless 
group discussion as a means of assess- 
ing variations among persons in the 
tendency to exhibit successful leader- 
ship behavior have been considered. 
While the procedure was originated as 
a psychological technique in Ger- 
many over thirty years ago, it is 


TABLE 8 


ORTHOGONAL Factors CORRELATED 





Correla- 

tion in 

Sorori- 
ties 
(23) 
I. Leadership Potential (Esteem) 

II. Ascendency-Sociability 

III. Verbality 

IV. Intellectualism 


Factor 





24 
.39 
51 
aa 





* Unpublished data. 


WITH OvER-ALL LGD PERFORMANCE 


Correla- 
tion in 


I. Esteem 
Il. Tendency to Initiate Structure 
III. Ability in Verbal Situations 
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only in the last decade that system- 
atic reliability and validity studies 
have appeared. 

High interrater agreement and 
high test-retest reliabilities have been 
reported consistently, especially 
where descriptive behavior check 
lists have been used as the rating 
technique. 

Group size, length of testing time, 
type of problem presented, directions, 
seating arrangement, number of rat- 
ers, and rating procedure influence 
to a greater or extent per- 
formance in the LGD as well as the 
reliability and validity of LGD rat- 
ings. Studies of the effects of many 
of these have appeared since 1950. 

According to both deductive and 
inductive evidence, a valid assess- 
ment of the tendency to display 
successful leadership should correlate 
with: (a) status as measured by rank, 
when the assessment is based on per- 
formance among associates of differ- 


lesser 


ent rank; (b) esteem in real life as esti- 
mated by merit ratings; (c) successful 


leadership performance in other 
quasi-real and real-life situations; (d) 
personal characteristics as measured 
by psychological tests, such as capac- 
ity, proficiency, responsibility, and 
participation. On the whole, ratings 
based on performance in the LGD 
tend to do this. Therefore, it is in- 
ferred that they have some validity 
as assessments of the tendency to 
display successful leadership, 
leadership potential. 

Other evidence suggests that the 
successful leadership behavior ob- 
served in the LGD concerns primarily 
initiation of structure rather than con- 
sideration of the welfare of others. 

The preceding analysis, coupled 
with recommendations made by 
others working in the field of situa- 
tional tests, such as Weislogel (71), 
leads us to the following hypotheses: 


1.€., 
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1. To maximize the reliability and 
validity of the LGD and other situa- 
tional tests, scoring techniques should 
minimize reliance on the ability of 
observers to infer differences in per- 
sonality traits and future tendencies 
among examinees. Observers should 
merely report or evaluate the im- 
mediate behavior they observe. For 
example, in an unpublished study, 
the author found that two Army 
colonels’ estimates of the potential 
as Army officers of ROTC examinees 
were less valid as predictors of the 
merit ratings of the examinees than 
were the colonels’ check list descrip- 
tions of who initiated structure dur- 
ing the LGD. Similar results were 
noted in a study of fraternity mem- 
bers reported by the author and 
White (20). 

When the observer makes an infer- 
ence about the future behavior of an 
examinee from the observations of 
the examinee during the LGD, 
several potential errors are likely. 
The observer may err in deciding on 
which dimensions to make inferences; 
he may err in collating his observa- 
tions with the future behavior to be 
predicted; and finally, the dimen- 
sions on which the inferences are 
made may be private ones which 
cannot be shared with other obser- 
vers. The errors may be constant, 
variable, or both. Lack of knowl- 
edge and control over such errors 
disappears when raters are merely 
asked to describe what they observed 
and these descriptions are used as 
predictors. 

Further reduction of uncontrolled 
raters’ errors may be made in the 
following ways: 

a. Objective criteria for describing 
specific behaviors can be used (71). 
In the LGD, the actual number of 
times a participant suggests a new 
approach to a problem can be noted 
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instead of rating ‘“‘to what extent did 
the participant suggest new 
proaches to problems.” 

b. Forced-choice check can 
be used instead of present check lists. 
Otis’ (60) recent successful applica- 
tion of the forced-choice technique to 
interviewer ratings indicates promise 
for applying the same procedure to 
the LGD. 

2. To maximize validity, problems 
that are equally ambiguous to all 
participants, and that require the 
initiation of structure for their solu- 
tion, should be used. Where interest 
is in forecasting leader behavior in 
real life, the structure to be set up 
should approximate the real-life set- 
ting as much as possible. 

3. Since the LGD correlates fairly 
highly with most other intellectual 
or verbal situational tests, the use of 
many situational tests in a battery to 
forecast leadership potential is of 
doubtful utility. Thus, leadership 
ratings based on a one-hour LGD cor- 
related above .60 with leadership 
assessments based on the three days 
of OSS situational testing (59). <A 
similar correlation between the LGD 
and an entire battery of situational 
tests was found by Vernon (69). 
However, a significant proportion of 
the variance in over-all potential as 
a successful leader, unaccounted for 
by LGD, may be predicted by a fairly 
pure active or mechanical, initially 
leaderless, situational test which min- 
imizes variance due to verbal ability. 

4. Compared to paper-and-pencil 
techniques, the LGD is expensive; 
compared to the individual interview, 
in many locales, it may prove eco- 
nomical. The LGD appears feasible 
administratively, especially in mili- 
tary programs screening OCS or ad- 
vanced ROTC applicants, in civil 
service examinations, in 
college seniors who are to be assessed 
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at their colleges for management 
trainee positions, and anywhere else 
where ‘“‘boards’” have been used 
traditionally, such as in the selection 
of public school teachers. 

5. While the LGD appears to have 
some validity as a predictor of the 
tendency to be a successful leader in 
a number of situations, especially in 
comparison to other assessment tech- 
niques, tailor-made batteries of paper- 
and-pencil will undoubtedly 
vield higher validities in designated 
situations. However, it may be that 
just as the brief intelligence test 
is applicable for predicting train- 
ability for many skilled occupa- 
tions, so the LGD will provide a 
general technique for partially as- 
sessing potential success as a leader 
in a relatively wide range of situa- 
tions. 

6. A number of situations in which 
the LGD is less likely to be success- 
ful may include the following: 

a. The LGD is less likely to be 
valid for measuring or forecasting 
esteem or leadership potential when 
examinees can be tested only among 
others of different rank. In such a 
case, status—and not esteem or per- 
sonality—will determine who suc- 
ceeds in the LGD. 

b. The LGD is less likely to be 
useful where factors peculiar to the 
situation block initiation of structure 
where no structure exists. Conceiv- 
ably, in certain military settings for 
example, the examinees may be im- 
bued with the dictum ‘‘never volun- 
teer for anything.’’ However, exactly 
how this would affect LGD validities 
is unknown. 

c. Another unknown is the effect 
of the average verbal aptitude and 
educational status of the participants 
on the validity and utility of the 
LGD. It is expected that where this 
mean falls below a certain minimum, 
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LGD forecasting efficiency may suf- 
fer.” 

d. Since achievement and _ intelli- 
gence appear to correlate with LGD 
performance as well as with success 
as a leader in real life, any restriction 
in the range of intelligence or achieve- 
ment of LGD participants would be 
likely to reduce the forecasting eff- 
ciency of the LGD. Conversely, the 
greater the variance in intelligence 
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THE PARADOX 


In recent years several authors 
have called attention to a paradox in 
test theory. Gulliksen (7) appears to 
have been the first. He showed that, 
under reasonable conditions, ‘In 
order to maximize the reliability and 
variance of the test, items should 
have high intercorrelations, all items 
should be of the same difficulty level, 
and this level should be as near to 
50% as possible.’”’ But, he continued, 
“The criterion of maximizing test 
rariance cannot be pushed to ex- 
tremes. Test variance is a maximum 
if half of the population makes zero 
scores, and the other half makes per- 
fect scores. Such a score distribution 
is not desirable for obvious reasons, 
yet current test theory provides no 
rationale for rejecting such a score 
distribution” (7, pp. 90-91). 

All studies to be reviewed in this 
paper assume homogeneous tests, in 
the sense that all correlations be- 
tween items within a test are ac- 
counted for by a single common fac- 
tor. Validity, as used herein, refers to 
correlation of the test with that com- 
mon factor. While the studies are 
explicitly concerned with reliability, 
in this context reliability refers only 
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to degree of homogeneity. There is 
no concern with stability of functions 
over time. 

Tucker stated the paradox as fol- 
lows: “Consider the case when all 
the items in a test are equivalent; 
that is, when the items all measure 
the same trait, have equal reliabili- 
ties, and are equally difficult. In this 
case the items are equally intercor- 
related with coefficients equal to the 
item reliabilities . If the relia- 
bility of the items were increased to 
unity, all correlations between the 
items would also become unity and a 
person passing one item would pass 
all items and another failing one item 
would fail all items. Thus the only 


possible scores are a perfect one or one 


of zero” (19, pp. 1-2). He pointed 
out, as a consequence of this paradox, 
that under these circumstances in- 
creasing the reliability of a test be- 
yond a certain point will decrease the 
validity of the test, in contradiction 
to the usual belief, embodied in the 
correction for attenuation, that in- 
creasing the reliability of the test al- 
ways increases its validity. 

Brogden conceived of the problem 
as one of ‘‘determining the distribution 
of item difficulties which will maxi- 
mize the correlation of the test with a 
perfect measure of the characteristic 
the test is intended to measure” 
(1, p. 197). With perfectly valid 
items, he pointed out, the difficulties 
should be equally spaced in some 
sense, whereas with items that do not 
intercorrelate, the difficulties should 
all be as close to .5 as possible. Brog- 
den stated that item selection pro- 
cedures aimed solely at increasing the 
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reliability of the test may result in a 
decrease in validity, again, the ‘‘at- 
tenuation paradox.” 

Two writers who missed 
tenuation paradox were 
(11) and Cronbach (3). Loevinger 
called attention to the anomalous 
result of using equivalent items when 
the item intercorrelations approached 
unity and concluded that equivalent 
items were undesirable in the usual 
case. Since the phi coefficient and 
KR 20 (Kuder and Richardson's [10] 
formula 20) measure the departure of 
the items from equivalence as well 
as the interrelationships of the under- 
lying variables, she concluded that 
they were inappropriate as _ item- 
selection coefficients. 

Cronbach, in refuting Loevinger, 
stated: ‘The phi coefficient which 
tells when items do and do not dupli- 
cate each other is a better index 
just because it does not reach unity 


the at- 
Loevinger 


for items of unequal difficulty” (3, 


p. 329). Here Cronbach neglected 
the fact that if one uses the phi 
coefficient for item selection, one 
needs two rules: For lower values of 
phi, the higher the coefficient, the 
more will the two items contribute 
to the validity of the test. But for 
high values of phi, the lower the 
coefficient, the more will the two 
items contribute to the validity of the 
test. 

Similarly, Cronbach showed that 
the maximum value of KR 20 is not 
much less than unity for items with 
a specified distribution of item diffi- 
culties, and that the maximum value 
will drop for a greater range of item 
difficulties. If, however, maximizing 
KR 20 is made a rule for item selec- 
tion, as Cronbach recommends, there 
will be a tendency to select items with 
a narrower range of difficulty, or, 
in Cronbach's terms, more redundant 
items. Here Cronbach apparently 
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failed to see the paradox that where- 
as maximizing KR 20 for constant 
number of items will lead to increas- 
ing the validity of tests where the 
item intercorrelations are low, it will 
lead to decreasing the validity of 
tests where the item intercorrelations 
are high. 

The resolution of the attenuation 
paradox thus lies in having two rules 
for test construction. For the “‘classi- 
cal region,”’ the region in which the 
attenuation of validity decreases 
with increase in reliability, the closer 
the items are to difficulty of .5 and 
thus to equivalence, the more reliable 
and more valid will the test be. For 
the “region of paradox”’ the optimal 
distribution of item difficulties must 
be determined as a function of item 
intercorrelations. This solution was 
implied by Brogden and stated by 
Davis (5). 


DEFINITION OF THE REGION 
OF PARADOX 


Four studies have contributed to 
the definition of the region of para- 
dox, those of Tucker (19), Brogden 
(1), Davis (5), and Cronbach and 
Warrington (4). 

Tucker (19) assumed that ability 
or true score is normally distributed, 
that probability of success on any 
item is related to true score by the 
normal ogive, and that all items are 
equally difficult. He found that with 
median equivalent items, i.e., equiva- 
lent items of difficulty. equal to .5, 
the validity of the test constantly in- 
creased as the item reliability in- 
creased for a one-item test. For a 10- 
item test optimal item reliability 
(interitem correlation) was about .5, 
and for a 100-item test optimal item 
reliability was about .25. The maxi- 
mum validity was .8 for a one-item 
test or for a test with perfectly reli- 
able items, however many, since the 
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latter case is equivalent to that of 
one item. Maximum validity was .97 
with 100 items and slightly over .9 
with 10 items. 

Tucker investigated a single case 
of non-median equivalent items. All 
items were of such difficulty that the 
ability level where the probability of 
passing is .5 was one standard devia- 
tion above the mean. (It will be 
convenient to refer to this case as 
tests for which z=1. Results are iden- 
tical for z=—1, that is, for tests 
where all items are of such difficulty 
that the ability level where half the 
individuals pass the item is one 
standard deviation below the mean.) 
In this case the optimal interitem 
correlation for a 10-item test was 
about .3, for a 100-item test about 
15. The corresponding validities 
were about .83 and .96. For a one- 
item test the validity again increased 


as a function of item reliability to a 


maximum value of about .66. This 
value is also the terminal validity 
for a test of any number of items as 
the item reliability approaches unity, 
given that all items are of the speci- 
fied difficulty level. 

The optimal values of item reli- 
abilities under his conditions were 
surprisingly low, Tucker pointed out; 
however, he cautioned that exceeding 
these values caused less decrement in 
validity than falling short of them. 
Further, where item difficulties are 
not all equal, his results do not hold 
exactly. 

Brogden (1) assumed that the true 
score of ability was normally dis- 
tributed, that the tetrachoric cor- 
relations of all pairs of items within 
a test were equal, and the biserial r 
of each item with true score was the 
same for all items in a test. Phi co- 
efficients between items and point 
biserials between items and true score 
were not always constant within a 
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test because items were permitted to 
differ in difficulty. Validity coefficient 
of the test was studied as a function 
of tetrachoric r of items, number of 
items, and distribution of item diffi- 
culties. Item inter-r’s had the values 
.2, .4, .6, and .8. Numbers of items 
were 9, 18, 45, 90, and 153. All item 
difficulties were concentrated at half 
sigma units between and including 2.0 
and —2.0. The types of distributions 
were rectilinear, normal, skewed, and 
constant at each of the sigma values, 

Some of Brogden’s results are dis- 
played in Fig. 1, 2, and 3. Figure 1 
shows validity as a function of item 
intercorrelation for tests composed of 
median equivalent items, that is, 
items for which z=0. The four curves 
correspond to four values for n, the 
number of items. Figure 2 is similar 
to Fig. 1, except that all items are 
of difficulty z=1. (The curves of 
Fig. 2 also apply if all items are of 
dificulty z=—-1.) While Fig. 2 
again shows the attenuation paradox 
as a function of number of items, 
comparison of Fig. 1 and 2 shows the 
attenuation paradox as a function of 
item difficulty. Figure 3 shows valid- 
ity as a function of distribution of 
item difficulties for tests composed of 
90 items. The curve for z=0 is also 
a member of another family of curves 
shown in Fig. 3. These curves show 
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Fic. 1. A TTENUATION PARADOX AS A FuUNC- 
TION OF NUMBER OF ITEMS FOR TESTS Com- 
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Fic. 2. ATTENUATION PARADOX AS A FUNC- 
TION OF NUMBER OF ITEMS FOR TEsTs Com- 
POSED OF EQUIVALENT ITEMS OF DIFFICULTY 
z=1. DATA FROM BROGDEN’S (1) TABLE 3. 


what happens to a test of 90 equiva- 
lent items as the difficulty departs 
from median value. The data for 
these graphs are taken from Brog- 
den’s Tables 2 and 3. 

Reading across the three figures, 
one sees that, in general, validity 
increases with increasing item inter-r 
up to a point and then decreases. 
The only curve which continues to 
rise in these figures is that for rec- 


tangular distribution of item diffi- 


culties. For a large number of non- 
median equivalent items, the optimal 
item inter-r is apparently less than 
.2.. Paradox is exhibited whenever 
validity decreases as reliability (item 
inter-r) increases, that is, whenever 
the curve slopes downward.  Brog- 
den’s method, unlike Tucker's, does 
not permit ascertaining exactly the 
optimal value of item inter-r, so the 
computed points have been con- 
nected by straight lines rather than 
attempting to sketch a curve. Tuck- 
er’s article contains actual curves 
which may be compared with the 
figures here. 

Reading up and down Fig. 1 and 
2, one sees that validity always in- 
creases with increasing number of 
items, provided item inter-r and dif- 
ficulty distribution are held constant; 
however, the increase in validity ob- 
tained by increasing m becomes small- 
er as the item inter-r increases. 


Figures 1 and 2 show that optimal 
item inter-r decreases as m increases 
for constant item difficulty; in other 
words, the region of paradox in- 
creases as m increases. Comparison 
of Fig. 1 and 2 illustrates that valid- 
ity drops as item difficulty departs 
from the median value for constant 
n and item inter-r. The same com- 
parison illustrates that for tests com- 
posed of equivalent items, the region 
of paradox increases as the item diffi- 
culty departs from the median. 
(Support for these generalizations is 
contained in further data not repro- 
duced here.) 

The two highest curves of Fig. 3 
show an important manifestation of 
the attenuation paradox: Median 
equivalent items produce tests of 
higher validity for low values of 
item inter-r, while a rectangular dis- 
tribution of difficulty produces tests 
of higher validity for high values of 
item inter-r. The curve for normal 
distribution of item difficulties lies 
between that for rectangular distri- 
bution and that for z=0 but was 
omitted to make the figure more legi- 
ble. The point at which rectangular 
distribution produces higher validity 
than median equivalent items may 
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be taken as defining the limits of the 
region of paradox. Utilizing data 
from Brogden’s Table 2 shows the 
following: Higher validity was ob- 
tained by tests of median equivalent 
items than by tests with distributed 
difficulties for item inter-r’s of .2 and 
.4; however, for r of .4 and large num- 
ber of items (90 or 153), there was 
almost no difference in validity for 
tests with median equivalent items 
and tests with item difficulties dis- 
tributed normally or rectilinearly. 
For item inter-r equal to .6, the test 
with median equivalent items was 
superior for a 9-item test, ran about 
equal with tests of rectilinear and 
normal distribution of item difficulties 
for 18- and 45-item and ran 
slightly behind those with distributed 
difficulties for 90 and 153 items. For 
item inter-r’s equal to .8, distributed 
difficulties were clearly superior to 
equivalent items, more so as the num- 
ber of items increased. ‘The rectilin- 
ear had a slight advantage over the 
normal distribution for 
more than 18 items. 

The curve in Fig. 3 for skewed dis- 
tribution of item difficulties falls far 
below the curves for rectangular, 
normal, and median equivalent items. 
This decrement is chiefly a result not 
of skewness but of the departure of 
mean item difficulty from z=0. The 
rectangular and normal distributions 
do have means at z=0, whereas for 
the skewed distribution the mean 
(computed by the reviewer) is at 
2=1.21. 

For a given item inter-r, where 
tests are composed of items of con- 
stant difficulty, the validity falls off 
fairly sharply with the departure of 
that difficulty from the median; more- 
over, the greater the departure of the 
difficulty from the median, the lower 
the optimal value of the item inter-r. 
Comparing now the curve for skewed 
difficulties, which corresponds to a 


tests, 


tests with 
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mean z=1.21, with tests of constant 
difficulty equal to 1 and to 1.5, one 
obtains at least a hint that distribu- 
tion of item difficulties protects the 
test from decrement of validity due to 
departure of difficulty from the medi- 
an, at least for moderate and high 
values of item inter-r. More striking 
is the fact that despite the distribu- 
tion of item difficulties, validity con- 
stantly falls with increasing item 
inter-r within the range considered, 
when the mean difficulty of the items 
is not appropriate for the group tested 
and the number of items is large. 
Practical considerations often lead 
to the use of a test with a group whose 
mean ability is not exactly the same 
as the mean difficulty of the items. 
Thus this important effect deserves 
much more extended investigation. 

Davis (5) assumed that the most 
desirable distribution of test scores 
is a rectangular one and sought the 
conditions under which such a dis- 
tribution could be obtained. He con- 
cluded that where the tetrachoric 
item inter-r’s are .5 or less, the clos- 
est approach to a rectangular dis- 
tribution will be obtained with medi- 
an equivalent items. As the item 
inter-r rises above .5, the item diffi- 
culties should be increasingly dis- 
persed. In comparing Davis’ result 
with those of Tucker and Brogden, 
note that the latter investigators as- 
sumed that the true scores were nor- 
mally distributed. Validity coeffi- 
cients thus could equal unity only if 
obtained scores also were normally 
distributed. In effect, then, Tucker 
and Brogden assumed a normal dis- 
tribution as a desideratum while 
Davis assumed a rectangular distri- 
bution.’ 

Cronbach and Warrington (4) as- 
sumed that the probability of success 

§ Davis’ derivation, obtained from him in 


mimeographed form, appears to be somewhat 
less rigorous than the other two. 
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on any item is described by the nor- 
mal ogive; however, in contrast to 
the preceding studies, in which per- 
sons with no ability were assumed to 
have zero probability of passing the 
item, they dealt only with three- 
choice items. Thus persons with no 
ability had a 4 chance of passing 
any item. The standard deviation of 
the ogive, ga, was the same for all 
items within a test. Most of the tests 
had 30 items. The underlying ability 
was assumed to be normally distrib- 
uted. The scale value of each item 
is that level of ability for which the 
probability of passing the item is 3 
(or, if corrected for 
Items were located at 
between 2.5 and 


chance, }). 
scale values 
—2.0, inclusive, at 
intervals of .5. Pattern A had 30 
items at scale value 0; pattern B 
had 6 at .5, 18 at 0, and 6 at —.5; 
pattern C had 6 each at 1.0 to —1.0 
inclusive; and pattern D had 3 each 
at 2.5 to —2.0 inclusive. Cronbach 


and Warrington’s chief problem was 
outside the realm of the attenuation 
paradox. ‘They were concerned with 
the screening efficiency of the tests 
for various possible cutting scores 


from close to zero to over 90%. 
Screening efficiency was measured by 
fois Of the dichotomized score scale 
against the continuous, normally dis- 
tributed scale of ability. For each 
test a series of such validity coeffi- 
cients was obtained and its 
plotted. 

For perfectly precise items (a4=0) 
the validity of pattern A was great- 
est only in the neighborhood of a 
cutting score of .5, and fell sharply 
below that of other patterns as the 
cutting score deviated from the medi- 
an point. The range of cutting scores 
for which pattern A was superior 
increased as precision decreased un- 
til, for values of og21, pattern A 
was superior throughout most of the 


curve 
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range. A somewhat similar family of 
curves was generated by holding pat- 
tern constant and letting item pre- 
cision vary; that is, the greater the 
item precision, the more valid the test 
for cutting scores near the median, 
but the less valid for extreme cutting 
scores. This tendency was particular- 
ly clear for pattern A, which they 
called a ‘‘peaked” test, and is a 
manifestation of the attenuation par- 
adox. 

Cronbach and Warrington pro- 
posed an interesting integration of 
their findings. They suggested that 
increasing the variance of the scale 
values of the items (¢,?) has about the 
same effect as increasing the variance 
of the item ogive (¢,7), and that the 
important quantity to be considered 
is the sum of the two variances. 
Using eta rather than r as a measure 
of over-all validity, they found that 
under their conditions validity in- 
creased as a function of ¢/+0,7 to 
a maximum where the sum of the 
variances was about .5, then slowly 
declined. 

At one point they stated, “Insofar 
as we can judge from these and Tuck- 
er’s results, the peaked test has supe- 
rior validity for even lower values of 
oa (higher 7,;) when the test is longer 
than thirty items” (4, p. 146). Cron- 
bach and Warrington’s data on the 
effect of test length are scanty, but 
the data of Tucker and Brogden are 
clear, and opposite to what Cron- 
bach and Warrington indicate: The 
longer the test, the greater the region 
of paradox. (See Fig. 1 and 2.) 

An incidental finding of Cronbach 
and Warrington’s study was that the 
peaked tests tended to have bimodal 
or markedly skewed distributions, 
while the tests with dispersed dif- 
ficulties had more nearly normal dis- 
tributions. They did not consider 
the possibility that this finding is 
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itself a manifestation of the attenua- 
tion paradox. As reason for using 
fois they stated, “Unlike product- 
moment 7, fyi, is independent of the 
test score metric. Our results are 
therefore invariant as test scores are 
transformed to other scales” (4, p. 
132). But, one may ask, what opera- 
tions correspond to transforming test 
scores to other scales? Given their 
conditions, only their scale results. 
The effect of various changes in 
conditions would have to be investi- 
gated. Indeed, it is generally con- 
sidered a requirement for use of Tpiz 
that the dichotomized variable be 
normally distributed. The effect of 
using fi, and eta as opposed to r 
apparently is to increase the ad- 
vantage of peaked as opposed to 
distributed item difficulties, and thus 
in effect to underestimate the region 
of paradox. 

In summary, the region of paradox 
is most clearly defined in the studies 
of Brogden and Tucker. Neither 
Davis’ study nor that of Cronbach 
and Warrington detected that the 
region of paradox increases with the 
number of test items, an effect clearly 
discernible in the two previous stud- 
ies. If Cronbach and Warrington 
had used the product-moment r in- 
stead of eta they apparently would 
have obtained results comparable to 
those of Brogden and Tucker. Cron- 
bach and Warrington called attention 
to the importance of considering 
items where there is a nonnegligible 
probability of success without abil- 
itv. Their method appears better 
adapted to further exploration of this 
problem than other methods, pref- 
erably with r evaluation 
function. A valuable extension of 
their study would include two-choice 
and four-choice items as well as ones 
with no probability of success without 
ability. Brogden’s method has pro- 


used as 
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vided the most detailed information 
concerning the limits of the region of 
paradox and is well adapted to further 
investigation of the relative merits 
of concentrated and distributed diff- 
culties under circumstances where a 
test may be used for groups differing 
in mean ability. Cronbach and 
Warrington'’s method could also be 
used for the latter problem. Tucker's 
paper, being analytic, is likely to 
prove the most substantial contribu- 
tion to test theory of any reviewed 
herein. 


APPLICATION TO TEST 
CONSTRUCTION 

The statistical phenomena sum- 
marized by the phrase ‘‘the attenua- 
tion paradox” reveal the importance 
of the mean and variability of the 
standardization sample in the evalua- 
tion of a test’s validity. Reading up 
and down Fig. 3, one can interpret 
the curves for differing values of z 
as showing a striking decrement in 
validity for tests composed of median 
equivalent items when they are 
subsequently applied to groups hav- 
ing a different mean on the trait 
measured, Similarly, the validity co- 
efficients obtained for a hypothetical 
set of tests with decreasing disper- 
sion of item difficulties correspond 
to those obtained by considering the 
items as constant but administered 
to groups of increasing variability. 
Generally speaking, the standardiza- 
tion sample should have the same 
mean and variance as the group on 
which the test will ultimately be 
used. In practical situations usually 
no such identity can be guaranteed, 
certainly not for the life expectancy 
of a well-constructed test. 


kor tests in the low homogeneity 
or “classical region,” as the variance 
of the sample increases, mean held 
constant, the validity coefficient in- 
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creases. Thus, to obtain a lower 
bound to the validity coefficient, the 
standardization sample should be at 
most as variable as any sample on 
which the test will be used. 

For tests in the “region of para- 
dox”’ the situation is more compli- 
cated. For a given degree of item 
intercorrelation and a given number 
of items, there is probably an optimal 
distribution of item difficulties. The 
higher the item inter-r, the more it 
is true that each item added to the 
test adds validity only if it differen- 
tiates at a different level of difficulty 
than those items already included. 
Any considerable increase in the vari- 
ability of the group will decrease the 
validity of the test. Thus, to obtain 
a lower bound for the validity co- 
efficient, the standardization sample 
should be at least as variable as any 
sample on which the test will be used. 

Variation in the mean of the appli- 
cation sample from that of the stand- 
ardization group must also be con- 
sidered. Cronbach and Warrington 
concluded that where general use- 
fulness over a period of time is sought 
items should be concentrated around 
median difficulty; however, they con- 
sidered tests as selective instruments, 
which is legitimate but different from 
the present consideration of tests as 
measuring instruments, and their use 
of biserial correlation is question- 
able. Whether concentration of item 
difficulties makes a test more sus- 
ceptible to loss of validity in applica- 
tion to groups of different means is 
a point on which evidence is not 
yet available. 

Theoretical and practical consider- 
ations may be drawn together as 
follows: When application to a single 
group is considered, concentration of 
item difficulties is called for in the 
case of item inter-r’s usually met, 


which are low. Consideration of use 
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of tests for differing groups will lead 
to dispersion of item difficulties for 
some cases, i.e., where item inter-r’s 
are not too low. In practice, however, 
it is usually not possible to find large 
numbers of items exactly at the 50-50 
point. The consequent inevitable dis- 
persion of difficulties is in many 
cases probably desirable. A method 
of item selection that favors median 
items but does not exclude good items 
with more extreme difficulties seems 
indicated by considerations. 
One such method has been proposed 
recently (13), and no doubt other 
methods have this property. That 
one widely used method of item 
selection has a very different effect 
will be shown in the next section. 


these 


IMPLICATIONS FOR TEST THEORY 


Consider the definition: A para- 
doxical property of a test is a property 
such that the validity of the test is 
not a monotonic function of that 
property; i.e., validity is sometimes 
an increasing and sometimes a de- 
creasing function of the property. 

The objections to taking validity 
as the focal concept for test theory 
are well known. Probably all psycho- 
metricians would agree that test 
theory must have basic concepts 
which refer to intrinsic properties of 
tests. Is it not intuitively valid, 
however, to demand that the most 
basic concept of psychometrics shall 
be a nonparadoxical property of tests? 
Are there such nonparadoxical prop- 
erties? 

Clearly reliability, or one of its 
cognate the focus of 
present-day test theory; the statisti- 
cal theory of reliability is the bulk of 
classical test theory. There have 
been many criticisms of the reliabil- 
ity concept; those by Thorndike (17), 
Cronbach (2), and Loevinger (11) 
have a good deal in common, particu- 


concepts, is 
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larly the distinction between stabil- 
ity and homogeneity, which is lost in 
the ordinary usage of “reliability.” 
The purpose of the present article 
is not, however, to review all the 
objections to the reliability concept 
but to draw attention to a single in- 
stance in which the statistical theory 
of reliability leads to self-contradic- 
tion. 

Such solutions and explanations of 
the attenuation paradox as have been 
proposed, those of Brogden, Davis, 
and Cronbach and Warrington, are 
appeals to common sense completely 
outside traditional theory of relia- 
bility. Perhaps the most paradoxi- 
cal aspect of the attenuation paradox 
is that Gulliksen, who appears to 
deserve credit for discovering it, 
failed to include any reference to it 
in his comprehensive summary (8) 


of mental test theory. In his own 


words, ‘“‘Current test theory provides 


no rationale for rejecting’ the anom- 
alous score distribution. 

Lord (14) has probably come clos- 
est to integrating the attenuation 
paradox with classical test theory. 
He points out that while Pearsonian 
correlation between test 
ability decreases in what is_ here 
called the region of paradox, the 
curvilinear correlation with ability 
constantly increases with increasing 
item inter-r. Unfortunately, however, 
consideration of curvilinear correla- 
tion obscures the paradox, which 
does in fact exist. Lord concerns 
himself chiefly with the classical re- 
gion; it is not clear that his approach 
could lead to the distinction between 
the two regions of test theory. 

Reliability is paradoxical; satura- 
tion, a concept imbedded in a method 
of test construction recently proposed 
by Loevinger, Gleser, and DuBois 
(13), is virtually identical with one 
of the Kuder-Richardson (10) 


score and 
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efficients and thus is paradoxical in 
the same region. Guttman’s (9) con- 
cept of scalability and Loevinger’s 
(11) concept of homogeneity clearly 
differ from reliability and saturation. 
Are these properties paradoxical? 
Consideration of the results quoted 
in the second section of the present 
paper shows that these properties are 
also paradoxical, not, however, in 
what is here called the region of 
paradox but in the classical region. 
To distinguish the two kinds of para- 
doxical properties, scalability and 
homogeneity may be called neo- 
paradoxical properties. Only scale 
theory will be considered. Its im- 
portance is pointed up by the fact 
that virtually all recent contributions 
to test theory in sociology and closely 
related disciplines are phrased in 
terms of scale analysis. 

Guttman originally measured scal- 
ability in terms of a ‘‘coefficient of 
reproducibility,” which is the pro- 
portion of the responses of the group 
which can be reproduced from knowl- 
edge of item difficulties and total 

Guttman distinguished 
from ‘‘quasi-scales”’ accord- 
whether the coefficient of 
reproducibility exceeded .9, assuming 
certain other conditions were satis- 
fied. Festinger (6), Loevinger (12), 
and probably others criticized the 
distinction between scales and quasi- 
scales as arbitrary. According to 
these writers, Guttman and his fol- 
lowers were erecting a qualitative 
difference out ofa purely quantitative 
one. The considerations of the pres- 
ent paper, however, justify what was 
apparently a purely intuitive dis- 
tinction on the part of Guttman. 
When we deal with cumulative (12) 
dichotomous items, which is the usual 
case (16), the distinction between the 
two regions of test theory can be 
clearly, if laboriously, drawn. What 


scores. 
4é 9 
scales 


ing to 
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Guttman called quasi-scales are tests 
in what is here called the classical 
region; what Guttman called scales 
are tests in what is here called the 
region of paradox. There is no evi- 
dence, however, that workers in the 
field of scale analysis are aware. of 
the precision with which this dis- 
tinction can be made, nor of the con- 
sequences of the distinction which 
have been elaborated by psycholo- 
gists working in test theory. 

A number of objections have been 
raised against the original coefficient 
of reproducibility, a major one being 
that its lower limit can be arbitrarily 
raised by selecting items extreme with 
respect to difficulty (or popularity). 
In practice, workers in this field have 
probably always more or less taken 
this fact into consideration. A recent 
paper by Menzel (15) has proposed a 
modification of the coefficient of re- 
producibility which eliminates this 
problem. Clearly, insofar as use of 
Guttman’s coefficient led to the selec- 
tion of items extreme in difficulty as 
opposed to median items, it led to a 
decrease in validity of the resultant 
test, since under none of the condi- 
tions investigated above did a selec- 
tion of extreme rather than median 
items lead to an increase in validity. 

Stouffer, Borgatta, Hays, and 
Henry (16) report that, in practice, 
in order to derive scales rather than 
quasi-scales from available data, it 
has been necessary to select one from 
among several apparently equally 
good items at any given difficulty 
level. According to them, probably 
the most common method of im- 
proving scalability has been reduction 
of number of items per test, often to 
no more than four or five items. 
Inspection of Fig. 1 and 2 and other 
available data reveals no set of con- 
ditions under which reduction in 
number of items increases validity; 
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on the contrary, increase in number 
of items invariably increases validity, 
most markedly when the number of 
items is small. 

A further consequence of using the 
concept of perfect scales as the de- 
sideratum in test construction has 
been that test constructors have been 
forced to select items that are more 
or less evenly spread in difficulty. 
Such selection will tend to increase 
scalability, but in the classical region 
it will tend to decrease validity. 

In summary, Guttman’s coeffi- 
cient of reproducibility gives advan- 
tage to extreme as opposed to median 
items. If this coefficient is modified 
so as to give advantage neither to 
extreme nor to median items, pursuit 
of scalability still leads to decrease 
in number of items and to dispersion 
of item difficulties. But the evidence 
cited in this review is that decrease 
in number of items always leads to 
decrease in validity, other things 
equal; there is no set of conditions 
under which extreme items lead to 
more valid tests than median items; 
and in the classical region, dispersion 
of item difficulties leads to decrease 
in validity. There is reason to be- 
lieve that most of the attitude tests 
currently in use fall in the classical 
region. One may conclude that ex- 
tensive use of scale analysis has al- 
most certainly led to loss of validity 
in tests used in sociology. 

In recognition of the difficulties 
to which scale analysis has led, more 
or less akin to those cited here, Stouf- 
fer, et al. (16) have proposed a modifi- 
cation of the method. They continue 
to have about five items per scale, 
but each item is a “contrived item.’ 
A contrived item is a composite of 
several items similar in level of diff- 
culty, but each contrived item is 
scored zero or one, depending on the 
number of pluses in the items of 
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which it is composed. The contrived 
items are constructed by a cut-and- 
try method. The authors show that 
tests constructed by their method are 
superior to those constructed by the 
more common method of scale analy- 
sis with respect to scalability and 
some definition of reliability, but 
only where the original items are 
quite good. Their method appears 
to be a compromise between scale 
analysis and methods favoring choice 
of equivalent items, to which most 
psychologists would lean. While the 
new method may ameliorate the 
flaws of scale analysis, it does not 
solve the conceptual difficulty. Scal- 
ability, like reliability, is a paradoxi- 
cal concept. Worse, scalability is 
paradoxical in the classical region, 
which is heavily populated with 
tests, while coefficients such as the 
Kuder-Richardson (10) reliability co- 
efficients are paradoxical only in the 
region of paradox, which is thinly 
populated with tests.‘ 

The problem of finding nonpara- 
doxical properties of tests remains. 
A property different from those dis- 
cussed above which may qualify is 
the discriminating power of the test. 
Many writers have used this term 
without definition, as if it were self- 
explanatory. Several recent papers 
have provided indices of discriminat- 


4 Dr. Ledyard Tucker has, however, called 
my attention to the fact that a problem being 
worked on by John Keats at Princeton Uni- 
versity provides a mathematical model for 
the method of “contrived items.” Keats as- 
sumes that the probability of success on an 
item increases with ability at only one point 
on the scale of ability, rather than describing 
an ogive. It is not assumed that the probabil- 
ity of success on the item is zero below that 
point nor unity above it, only that it is con- 
stant everywhere but at the single point 
One has difficulty thinking of test content for 
which this assumption is as reasonable as the 
assumption of a graded increase in the prob 
ability of success on items. 
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ing power; full consideration of these 
indices would lead far afield. A single 
approach will be selected arbitrarily 
to show that this concept differs 
from concepts like reliability and scal- 
ability. 

Loevinger, Gleser, and DuBois (13) 
have distinguished three aspects of 
discriminating power: fineness, prob- 
ability, and range. Discriminating 
fineness refers to the size of the dif- 
ferences in the trait which can be 
discriminated. Discriminating prob- 
ability refers to the proportion of 
discriminations which are in the 
direction as trait differences. 
Discriminating range refers to the 
general level of the trait at which dis- 
criminations are made. No single co- 
efficient measures the three aspects 
of discriminating power; however, 
some coefficients are closely related 
to discriminating fineness, some to 
discriminating probability, and some 
to both. If the test construction 
process is conceived as_ beginning 
with the best few items in a finite 
pool and adding items one at a time 
from the pool in the order of the 
goodness of the items, then coeffi- 
cients measuring only fineness con- 
stantly those measuring 
only probability constantly decrease, 
while those measuring both may in- 
crease at first and then decrease. 
The latter type of coefficient pro- 
vides a basis for deciding when to 
stop adding items to the test. 

This definition of discriminating 
power may prove unappealing to 
many psychometricians, since no 
coefficient corresponds to it 


same 


increase 


single 
and since it is essentially an intuitive 
rather than a quantitative concept. 
Yet other quantitative disciplines 
begin frankly with intuitive concepts. 
As Ruth Tolman (18) has recently 
observed, among physical scientists 
it is the highest compliment to speak 
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of a fellow scientist as having ‘‘physi- 
cal intuition,”’ but among those psy- 
chologists who most desire to emulate 
the physical scientists, intuition is 
rarely referred to in a similar compli- 
mentary sense. 

In summary, paradoxical property 
of a test is a property such that valid- 
ity is not a monotonic function of that 
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property. Reliability is a property 
paradoxical in a region lightly popu- 
lated with tests. Scalability is a 


property paradoxical in the region 
heavily populated with tests. Other 
possible properties, such as the dis- 
criminating power of the test, have 
not been fully investigated. 
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REGRESSION ANALYSIS: PREDICTION FROM 
CLASSIFIED VARIABLES! 
ROBERT M. GUION 
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Occasionally in psychological re- 
search a qualitative independent var- 
iable seems promising in the predic- 
tion of a quantitative variable. Some- 
times, particularly in applied re- 
search, data for basically continuous 
independent variables must be ob- 
tained, if at all, in terms of broad, 
discrete categories, possibly of un- 
equal intervals. Such variables are 
at best inconvenient to work with 
in prediction problems; all too often, 
the difficulties they present are solved 
by ignoring them, resulting in the 
loss of a potentially useful predictor. 

In a study of supervisors (2), 


the writer was faced with this prob- 


lem. It was necessary to develop a 
regression equation by which a de- 
pendent variable (number of employ- 
ees supervised) could be predicted 
from such independent variables as 
man-hours of assistance, rough classi- 
fications of plant size, and the purely 
qualitative variable of industry clas- 
sification. The problem was solved by 
a technique of regression analysis. 
Regression analysis is a_ least- 
squares regression technique  per- 
mitting the prediction of a quantita- 
tive variable from categorized or 
qualitative variables.2, There seems 


! This article describes a procedure used in 
a thesis submitted to the Graduate School of 
Purdue University in partial fulfillment of the 
requirements of the degree of Doctor of 
Philosophy, carried out under the direction 
of Dr. C. H. Lawshe, and sponsored by the 
Purdue Research Foundation. 

2 The theory and procedure presented here 
under the name of regression analysis were 
developed primarily by Professor I. W. Burr 
of the Statistical Laboratory of Purdue Uni- 


State University 


to be no particular limit to the num- 
ber of variables that can be handled 
by this method, as there is in the 
usual multiple regression techniques. 
Apparently the absence of products 
in the resulting regression equation 
reduces the likelihood of cumulating 
to a serious degree the errors of meas- 
urement. It is, of course, possible 
that spurious correlation might re- 
sult from including too many vari- 
ables. 

It is beyond the intent of this re- 
port to give a complete account of 
the theory and practice of regression 
analysis. The procedure, which has 
been used in agricultural research, is 
capable of further refinement and 
needs evaluative research. <A de- 
scription of the basic theory and of 
the method's application in agricul- 
ture is provided by Anderson and 
Sancroft (1). This report seeks mere- 
ly to outline the procedure as modi- 
fied in the writer's use of it. 

The technique will be discussed and 
outlined by example, using a three- 
variable situation that might occur 
in personnel research. For purposes 
of illustration, we will be concerned 
with predicting the average number 
of working days attendance per 
quarter year of machine operators. 
The prediction will be based upon a 
knowledge of the applicant's cate- 
gorical standing in each of three 
variables: 


versity. 
similar 


It was subsequently learned that 
procedures based upon the same 
mathematical theory have been applied to 
biometric data. 
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Variable A: Preference for mechanical 
work 

1. First or second choice 

2. Third choice or not indicated 
Variable B: Housing status 


1. Own home 
2. Rent home 
3. Rooming 


Variable C: Presently employed? 


1. Yes 


ys No 


There are seven categories in all 
in this example; any applicant may 
be classified as being in three of them. 
Regression analysis seeks to estimate 
eight parameters: a, a2 for the two 
categories of variable A; 8), Be, Bs for 
the three categories of variable B; 
"1, Y2 for the two categories of vari- 
able C; and yu for the mean attendance 
(dependent variable) of the general 
population of applicants. The sum 
of the parameters for the categories 
describing any individual, when 
added to the population mean, yields 
the dependent measure for that indi- 
vidual within limits of error, such 
that the regression model is 


x=a;+Bjyt+ vm + uct error. 


From a sample available for study, 
we can estimate these parameters 
(estimates indicated by italic letters) 
so that 


x=a+b;+a+ Mm. 


In other words, an adjustment fac- 
tor is assigned to each category of 
each variable. For any individual 
applicant in the illustrative situa- 
tion, the predicted attendance record 
would be equal, within limits of error, 
to the mean attendance record of the 
total sample plus the algebraic sum 
of the adjustment factors of the cate- 
gories that describe him. 
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A least-squares solution of these 
parameter estimates is sought by 
using a system of eight simultaneous 
equations (the number of categories 
plus one) by the following procedure: 

1. Prepare the matrix of the system 
of equations. The equation for the 
first category of the first variable is 


i 
“hay 


—7 


, — 1, o—_ Mas |by01 = No, \bD2 


‘ Nay\b303 — Nayle,O1 


— Nay\exC2 — Nam = 0. 


~X,, is the sum of the measures of 


the dependent variable (i.e., attend- 
ance in days) for all cases appearing 
in the first category of variable A: 
those who have indicated on the 
application blank a first or second 
choice for mechanical work. The 
frequency, or number of cases, ap- 
pearing in this category is designated 
nq,. The notation 1,,\), indicates the 
number of in this category 
which are also classified in the first 
category of variable B: those who 
own their homes. The next six equa- 
tions follow this form, being equa- 
tions for the second category of 
variable A, the first category of 
variable B, and so on. 

The final equation is for the general 
mean and is 


cases 


No 
> Xi- 


tel 


Na, — Nagd2 — Nob, — no,be 


— y,b3 — NO — = (), 


Transposing all parameter esti- 
mates and their coefficients yields a 
matrix equal to the vector of the de- 
pendent variable. This 8 by 8 matrix 
of independent variables consists of 
a set of frequencies as coefficients of 
the unknown parameter estimates. 
The complete system of equations, in 
matrix form blocked off for illustra- 
tive purposes, is shown as Table 1. 

It will be seen that the total fre- 


N-.C2 — Nm 





REGRESSION ANALYSIS 


quency in any given category appears 
in the diagonal, and that the coeffi- 
cients of the diagonal are equal to the 
coefficients of the last equation for 
the general mean. 

The sample used in this illustra- 
tion consists of 49 machine operators, 
26 of whom have indicated machine 
work either as their first choice or 
second (first category, variable A). 
Of these 26 men, 7 own their homes, 6 
are renting, and 13 are rooming or 
living with their parents (the three 
categories of variable B, respective- 
ly). They totaled 1,417 
days attendance during a 
month period. 

Blocking the matrix off as shown 
in Table 1 provides a convenient 
check on the accuracy of the tabula- 
tions made: the sum of the frequen- 
cies in any given block must be equal 
to the total number of cases in the 
sample, unless there are individuals 
included for whom complete data 
are not available. In any column 
within a block, the sum of the co- 
efficients is equal to the coefficient 
in that column in the equation for 
the general mean. 

2. Reduce the matrix. This system 
of equations, as it now stands, is 
additive since the sum of the coeffi- 
cients of the equations for each vari- 


working 
three- 


able yields the appropriate coeff- 
cients for the equation for the general 
mean. 
not have full rank and therefore has 


This set of equations does 


no unique solution. 

A solution obtained by 
making the restriction that the sum 
of the parameters, or a 
weighted sum, be equal to zero for 
any given variable. With this re- 
striction, the process of reparametri- 
zation, discussed by Kempthorne (3), 
can be applied. Reparametrization 
seeks to replace certain parameters 
and solve for a different set. It can 


can be 


estimated 
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be done by setting one parameter in 
each variable equal to zero, which 
results in each case in the elimination 
from the matrix of one row and one 
column through the diagonal ele- 
ment of that parameter.’ In the il- 
lustrative case, eliminating the last 
category of each variable (coefficients 
in Table 1, italicized), we have the 
following reduced 5 by 5 matrix of 
equations to solve instead of the origi- 


nal 8 by 8: 
11 | 1417 
10 13| | 743 
6 1 | 842 
23 23! | 
23 49 | 


1260 


2700 


This reduced matrix can be solved.4 
The solutions for the illustrative 
problem are shown in Table 2. 

3. Solve the equations. There are, 
of course, many methods of solving 
sets of simultaneous equations. How- 
ever, in systems as large as will be 
found in cases of regression 
analysis, loss of significant figures in 
the course of computation will be- 


most 


come a serious problem if direct 
a determinant method 
In the supervisory study 
cited, for a matrix of 27 equations, 
the (Gauss-Seidel iterative method 
was used. The 134 iterations required 
to reach a solution were performed on 
the IBM Card-Programmed Calcula- 
tor using a procedure described by 
Liggett (4). 


reduction or 


is used. 


* The simplification of reparametrization is 
largely the result of work done by L. E. 


Cheatham, and Professor V. L. 
Statistical Laboratory of 


Grosh, ; a Di 
Anderson of the 
Purdue University. 

4 Recognition should be given Raymond 
Woods, Department of Mathematics, Bowling 
Green State University, for his labor in solving 
the problem used here for purposes of illustra 
tion. 
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TABLE 1 


MATRIX OF COEFFICIENTS OF THE UNKNOWN PARAMETER ESTIMATES AND 
VECTOR OF DEPENDENT VARIABLE 


Coefficients of Parameters 


Categories [— - 


b; 


Variable A | 
Category 1 | a | 7 
Category 2 4 6 

Variable B 
Category 1 


| 13 
Category 2 0 
Category 3 i ££ 8 0 

Variable C 
Category 1 
Category 2 


12 10 
I] J 
23 13 


General Mean 26 


Vector 
(2X) 


b, 


bs 


1417 
1283 





743 
842 
1115 


1260 
1440 
2700 


14 
21 


Note: Italicized values are those eliminated in reparametrization 


4. Convert the obtained values. The 
solution after reparametrization 
yields for the illustration the values 
a,’, by’, be’, co’, and m’, since the last 
category of each variable had been 
set equal to zero. Conversion of 
these estimates to estimates of the 
original set of parameters ac- 
complished by simple algebra, the 
actual process depending upon the 
restriction first imposed. One of 
three restrictions might have been 
made: 

1. The sum of the estimated param- 
eters (or adjustment factors) can be 


is 


TABLE 2 


SOLUTIONS TO ILLUSTRATIVE EQUATIONS 


Parameter 
Estimate 


Prime 


> . 
meter . . 
I ara Solution* 


ay —1.048 
a 


db, +4.918 .689 
by +3.020 791 
bs .229 
a —1.865 990 
C2 -0.875 
+54.316 +55.102 


m 





* Solution to reparametrized matrix 


set equal to zero, if the population 
can be assumed to be equally dis- 
tributed among the various categories 
of the variable. 

2. The sum of the estimated param- 
eters of each variable, weighted ac- 
cording to the observed frequencies 
of the categories, can be set equal to 
zero, if it is assumed that the sample 
is random. 

3. The sum of the estimated param- 
eters of each variable, given a 
priori weights on the basis of known 
or hypothesized population frequen- 
cies, can be set equal to zero. 

The second restriction seems more 
common and will be used in the exam- 
ple. Following Kempthorne (3), it 
can be shown that a,;’+ta,=a;._ It 
is therefore necessary to find the kth 
value for each variable, i.e., the value 
eliminated in reparametrization. The 


k 
restriction is that > nai =0, or, in 
t 


the example where variable B has 
3 

three categories, }>m,b;=0. Sub- 
1 


stituting, this becomes in full 


no, (b;' + bs) + no,( be" +b3) + 0,03 = 0. 
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Simplifying and solving for bs, 
nyby’ + ny,be’ 
aes pale 
or, in the illustration, 
(13)(4.918) + (15)(3.020) 
Pp tes a” amen 


bs 


— 2.229, 
and, 
b, = by’ +63 = 4.918 — 2.229 = 2.689 
be = be’ + b3 = 3.020 — 2.229 = 0.791. 


The same general procedure is used 
to obtain the correct parameter esti- 
mates, or adjustment factors, for 
variables A and C, shown in Table 2. 
For the general mean, it is best to 
use the simple arithmetic mean as 
normally computed because of the 
rounding errors which creep into the 
corrected value after the solution 
through reparametrization. In this 
problem, the mean is 55.102. 

With these values known, the linear 
equation 


x=a+b;+a+m 


can be used for each applicant. In 


the illustration, we can now, by 
knowing the applicant’s classification 
in the three variables and by knowing 
the mean attendance of the sample, 
predict the applicant’s job perform- 
ance in terms of expected attendance 
per quarter. 

For example, an applicant who lists 
mechanical work as his first job 
choice (category a;, —0.492), who 
owns his home (category 5;, +2.689), 
and who is presently employed (cate- 
gory 4, —0.990) would be expected 
to have a quarterly attendance rec- 
ord, in integral units, of 56 working 
days, the prediction formula being 


x = §5.102 — 0.492 + 2.689 — 0.990 
56.309. 
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The relative influence of each inde- 
pendent variable can be gauged ac- 
cording to the variance of the param- 
eter estimates, such that, in this 
problem, variable B carries a heavier 
weight in prediction than variable C, 
which is in turn weighted more heav- 
ily than variable A. 

It should be pointed out that this 
problem has been illustrative only; 
where the units are discrete rather 
than continuous, which is true of 
both the frequencies of individuals 
in various categories and of the meas- 
urements of attendance, there is 
probably very little justification in 
attempting to achieve greater than 
integral accuracy. 


CONCLUDING COMMENTS 


least- 
squares multiple regression technique 
permitting the prediction of quanti- 
tative variables from qualitative or 
classified variables. As outlined here, 
regression analysis assumes no signif- 
icant interactions between variables. 
If interaction is found to exist, ad- 
ditional equations can be introduced, 
treating combinations of interacting 
variables as separate classifications. 
it should be pointed out, for the sake 
of economy, that any error made by 
a false assumption of no interaction 
is an error of underestimating rather 
than overestimating the relation- 
ship; the effect is one of ignoring addi- 
tional variables which could have in- 
creased the predictive efficiency. 

The regression equation derived 
from this method does not directly 
provide certain interpretational data. 
However, the predicted values of the 
dependent measure can be correlated 
by conventional procedures with the 
actual values. These correlation co- 
efficients can then be used to derive 
coefficients of determination and 
standard errors of estimate. 

The potential uses of the method 


Regression analysis is a 
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are many. Perhaps the most obvious 
are its applications to personal his- 
tory analysis and to profile analysis. 
Nevertheless, it should be made the 
subject of empirical investigations. 
Several research questions need an- 
swering. Its predictive effectiveness 


REFERE 


1. ANDERSON, R. L., & 
Statistical theory in research 
McGraw-Hill, 1952. 

2, Guion, R. M. The employee load of first 
line supervisors. 
1953, 6, 223-244. 

3. KEMPTHORNE, O. Design and analysis of 
experiments. New York: Wiley, 1952. 

4. LIGGETT, r. <. Two applications of the 


Bancrort, T. A. 
New York: 


Personnel Psychol., 


ROBERT M. GUION 


in comparison’ with other proce- 
dures, such as the Horizontal Per Cent 
Method (5, p. 256), needs to be de- 
termined, as does the effect of coarse 
grouping or of the number of vari- 
ables used. 


NCES 


IBM card-programmed electronic cal- 
culator. Proc. Industr. Computations 
Seminar, September, 1950, 62-65. 

5. SteabD, W. H., SHARTLE, C. L., & Assoct- 
ATES. Occupational counseling tech- 
niques; their development and application. 
New York: American Book Co., 1940 


Received September 16, 1953. 





PSYCHOLOGICAL BULLETIN 
Vol. 51, No. 5, 1954 


ESTIMATING THE SCALABILITY OF 
ITEMS—AN APPLICATION 


A SERIES OF 
OF 


INFORMATION THEORY 
RICHARD WILLIS 


University of Minnesota 


The powerful scaling technique 
developed by Guttman and others 
(3, 4, 5, 10) has several advantages, 
some of which seem to be unique. 
The merits and limitations of the 
methed have been evaluated else- 
where (1, 2, 6), and they will not be 
reviewed here. The present paper 
will be restricted to a consideration of 
the problem of estimating the degree 
of scalability present in the area in 
question; i.e., the degree to which 
items in the scale are measuring the 
same thing in the population mem- 
bers. 

Guttman discusses several factors 
which should be considered in this 
connection, namely, (a) the number 
of items, (b) the number of respond- 
ents, or size of the population sam- 
ple, (c) the number of errors, or re- 
sponses which do not fit the pattern 
of a perfect scale, (d) the distribution 
of these nonfitting responses, (e) the 
number of response categories of 
the items, and (f) the distribution of 
the item marginals. The first three 


of these criteria are used to compute 


a coefficient of reproducibility r, 
which is equal to 1—pzx, where pg is 
the proportion of errors among the 
total number of responses. This 
coefficient should equal or exceed 
.90. The fourth factor is determined 
by an inspection of the response 
pattern. Nonfitting responses should 
be well scattered, indicating a ran- 
dom distribution, rather than appear 
in clusters, which would indicate a 
systematic distortion of the scale 
pattern. The last two criteria are ac- 


counted for by rule-of-thumb pro- 
cedures. The more response cate- 
gories given to the items individually, 
and hence the larger total number of 
response categories, the less likely is 
a scale pattern to appear by chance 
alone. It is recommended that the 
number of items be ten or more if 
they are all dichotomous, so that the 
number of response categories is at 
least twenty. Likewise it is recom- 
mended that there be included no 
more than a few items that have 
almost all responses lumped under a 
single alternative, as such items in- 
evitably boost the coefficient of re- 
producibility. For example, an item 
to which 95 per cent of the respond- 
ents answer “‘agree’’ could not possi- 
bly result in more than 5 per cent 
nonfitting responses, while an item 
with a 50-50 split between two cate- 
gories could theoretically result in 50 
per cent errors. Items with more than 
two alternatives might produce even 
at best more than 50 per cent non- 
fitting responses if not removed from 
the series. 

Because of the effects which the 
number of categories and the distri- 
bution of category frequencies have 
on reproducibility coefficients, they 
are not strictly comparable as gener- 
ally computed. A means of quantita- 
tively accounting for these effects 
would be especially useful because 
it is standard practice to administer 
each item with several categories, 
and then to combine adjacent cate- 
gories in an attempt to get r up to 
90. And, of course, the category 
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frequencies are changed at the same 
time. Another common practice in 
which these effects play an important 
part is that of selecting a subgroup of 
items from among those adminis- 
tered. In both cases the manipula- 
tion is selective, favoring the result 
sought by the experimenter, and 
should be corrected for. 

The concept of entropy’ has been 
applied to the theory of information 
by Wiener (11) and by Shannon (9). 
This interesting concept provides the 
means for such a correction. The en- 
tropy // associated with a contingent 
event is an indication of the number 
of possible outcomes it can have. It 
is consequently a measure of the un- 
certainty that the event will occur 
in any specified one of these possible 
ways. The toss of a die, with six 
equally likely possible outcomes, in- 
volves more entropy than the toss of 
a coin, with only two equally likely 
outcomes possible. Knowing the 
outcome of a high-entropy situation 
gives us more information, in the 
sense that more uncertainty is re- 
moved, than knowing the outcome of 
a low-entropy situation. 

How does this apply to estimating 
the degree of scalability present in an 
area? If we make our estimation 
first under one set of conditions and 
then make a second estimation under 
a different set of conditions having 
a different associated entropy value, 
the two estimates are not comparable 
unless we can allow for this difference. 
This is true whether we have tested 
in the same area twice or in two com- 
pletely different areas. The situation 
is somewhat analogous to comparing 


1 It has been so named because of its mathe- 
matical and conceptual similarity to the term 
entropy used in statistical mechanics, which in 
turn was named after the entropy of classical 
thermodynamics. For another application of 
entropy-like considerations in psychology, see 


Miller and Frick (7). 
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two chi-square values without con- 
sidering a difference in the degrees of 
freedom. Let us imagine, for exam- 
ple, that we have given ten trichot- 
omous items to 100 respondents. 
Then we arrange the responses so 
that they form as nearly as possible a 
scale pattern, using any of the avail- 
able methods (3, 5, 10). Say the co- 
efficient of reproducibility r turns 
out to be .80. This does not meet the 
accepted standard of .90, and we 
begin to judiciously combine adjacent 
categories so that we are left with ten 
dichotomous items and an r of .90. 
Is this really any stronger evidence 
that the area is satisfactorily scalable? 
The proportion of responses which 
do not fit the scale pattern has been 
cut in half, but perhaps the restruc- 
turing of the situation has reduced 
the likelihood of such nonfitting re- 
sponses by as much or more. The 
more we manipulate the situation to 
our advantage, the less chance there 
will be for errors to show themselves. 
Our judicious manipulation reduces 
the entropy and imposes greater re- 
strictions on the responses, forcing 
them to more nearly fit the scale pat- 
tern. 

To see how much this chance for 
errors to appear has been reduced, 
let us calculate entropy values for 
conditions before and after combining 
categories. The formula is 


H = — > pi log: pi, (1) 
t=] 


in which p; is the relative probability 
of the event 2 occurring.? As ?, will 
always be less than 1, if there is any 
uncertainty whatever about the out- 
come, logep; will be negative and H 

2 Shannon (9) discusses the characteristics 
which a good indicator of uncertainty should 
have and shows that this is the only form of 


expression which does have these character- 
istics. 
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will be positive. The logarithm is 
taken to a base of 2 only because H 
then takes on a convenient numeri- 
cal value—it then indicates how 
many times the number of equally 
likely possibilities must be halved 
to remove all uncertainty about the 
outcome. An H value of 1.00 would 
represent an event such as the toss 
of a coin, in which the equally likely 
possibilities are reduced from 2 to 1 
when the outcome becomes known. 
This same /7 value would also char- 
acterize a single respondent checking 
a dichotomous item, assuming we 
have no a priori knowledge about 
which category he is more likely to 
endorse, i.e., assuming ~; = p2=.5. In 
either the case of the coin or the single 
respondent answering a single di- 
chotomous item, formula 1 reduces to 


H = — 5 loge .5 — .5 loge .5 
— 5(— 1)— .5(— 1) = 1.00. 


But in our imaginary example above, 
we had 100 respondents instead of 
only one, and so we need not assume 
that all the presented choices for 
an item are equally likely to be chos- 
en. We obtain an estimate of the 
relative probabilities from the re- 
sponse frequencies, and the larger 
the group, the better will be our esti- 
mates. The response frequencies give 
us a set of p,’s for each item such that 
~p:=1.00; using these p,’s, we can 
calculate an J/ value for each item 
in the series. Then we sum item JI/'s, 
and the result is an /H/ value char- 
acteristic of our series of items. This 
value indicates how much informa- 
tion is obtained (i.e., how much un- 
certainty is removed) each time we 
administer our item series to a re- 
spondent. The formula may 
written 


be 


H = 


; 
— > Dd pis loge pis, (2) 


jel iol 
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where p,; is the relative probability 
of category 1 on item 7 being en- 
dorsed, J is the number of items in 
the series, and c; is the number of 
response categories presented with 
item 7. The summation is first over 
categories within items and then over 
items. Note that this formula does 
not take into consideration the pat- 
terning of the responses. It is the 
coefhcient of reproducibility which 
does this, and that is why the two 
measures must be considered together 
in order to see the whole picture. 

Returning once more to our exam- 
ple, suppose we compute the entropy 
before, 7J,, and after, H2, combining 
categories with the result that 17,/J/, 
=2. Then the nonfitting responses 
have been reduced only in proportion 
to the reduction in entropy. We 
should conclude that the increase in 
r is most likely due to sheer manipu- 
lation. It should be pointed out here 
that increased reproducibility after 
combining categories can mean more 
than this. Such would be the case if 
respondents were differentiating be- 
tween the first category on an item 
and the other two according to their 
true scale position, but were distin- 
guishing between the second and 
third categories only by chance, or 
by habits of expression, or by any 
other extraneous factor. But before 
we could conclude that this sort of 
thing was happening, it seems reason- 
able that we should require that the 
errors, or nonfitting responses, be 
reduced as much as or more than the 
entropy, so that 


pe,/ pe, 2 Hi/Hs (3) 


where pe, and pe, are the relative 
proportion of errors before and after 
combining categories, respectively. 
Lowering /7 means a greater structur- 
ing of the situation, and as we have 
been deliberately structuring it in 
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our favor, we must demand not only 
a reduction in errors, but a reduction 
at least in proportion to this increased 
structuring. 

The proportion in formula 3 may 
also be written as 


H2/ pr, = I/ pe, 


and this suggests the possibility of 
setting up a standard J//pg ratio 
with which to compare // to pg values 
obtained in practice. Taking into 
consideration the values of both r 
and II/p, would decrease the possi- 
bility of accepting as scalable an 
area which is not. It might also oc- 
casionally provide the evidence 
needed to accept as scalable an area 
about which we would have other- 
wise been in doubt. Choosing a 
standard ratio, call it H,/pz,, is 
largely a matter of deciding how con- 
servative we wish to be. It seems 
logical to choose pr,=.1, as this 
corresponds to an r of .90. We have 


a rough guide in choosing J/, in Gutt- 
man’s statement (10, p. 79) that at 


least ten items should be used if 
they are all dichotomies. If we as- 
sume that the p,’s are normally dis- 
tributed, so that each p, is associ- 
ated with an equal portion of the 
area under the probability curve, 
H,/pe, then becomes 8.9/0.1 = 89. If 
we accept this standard ratio, then 
we would require an r of more than 
.90 should the entropy of the item 
series fall below 8.9. If, for example, 
our item series had an J7 value of 17.8 
(which would be extremely high), 
an r of .80 would most likely be as 
strong an indication of scalability as 
the r of .90 in the first example, al- 
though a conservative experimenter 
would probably prefer to exceed the 
minimum values for both r and I//pg. 

A previously mentioned situation 
which frequently occurs in practice, 
for which the above considerations 
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are also in order, is that in which a 
subset of items which seems to form 
a scale is chosen out of the total 
number of items administered. 
Choosing such a subset imposes a 
greater structuring on the situation 
and produces a corresponding J/ drop. 
As in the case of combining categor- 
ies, IJ/pe as well as r should be noted 
before drawing conclusions. Whether 
or not an experimenter chooses to 
use a standard J/ to pg ratio, that sug- 
gested above or any other, it would 
seem advisable in any case to report 
any manipulation, such as combining 
categories or picking out subsets of 
items, and the resulting // drops 
along with the coefficients of repro- 
ducibility, as this information will be 
helpful in interpreting these r’s. 

When categories are combined or 
subsets of items are picked out ac- 
cording to the usual practice, there 
are actually two distinct processes 
going on simultaneously. First, there 
is the increased structuring and the 
concomitant decrease in entropy. 
Second, there is the process of selec- 
tion carried on by the experimenter in 
which he capitalizes on the oppor- 
tunities which are presented by the 
data. The first process is an auto- 
matic result of decreasing the number 
of items or response categories and 
would obtain no matter whether the 
combining of categories or selection 
of item subsets is done selectively or 
entirely at random. It is the first 
process, the automatic /7 drop, which 
is adjusted for by the ///pz ratio, but 
there remains the problem of account- 
ing for the process of selection. 

It would seem that the only test 
of scalability which allows for this 
second factor is that of cross valida- 
tion on anew sample. And obviously, 
the greater the #/ drop, the greater 
the possible influence of selection, and 
therefore the more esssential it be- 
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comes to cross validate. Not only 
should the new coefficient of repro- 
ducibility and the new H drop, taken 
together, meet the minimum values, 
but it is also important that the items 
arrange themselves in the same order 
of difficulty or very nearly so. It is 
desirable that the most ad- 
vantageous way of combining re- 
sponse categories be similar for both 
samples. 

There is another situation which 
does not occur often in practice, 
but which nevertheless is interesting 
from a theoretical standpoint. As- 
sume that we wish to interpret a co- 
efficient of reproducibility based on 
some other sample size than the 
usual 100. For some reason only 80 
respondents are available. Can we 
assume that 80 respondents will sup- 
ply us with 80 per cent as much in- 
formation as 100 respondents in the 
sense that the associated J/ values for 
the two testing situations would be in 
the ratio of 80 to 100? This assump- 
tion would not be justified. The proc- 
esses of combining categories and 
selecting a subset of items were both 
selective processes, and thus a defi- 
nite structuring cf the situation took 
place favoring a reduction of the pro- 
portion of nonfitting responses. But 
there is no such selection in the case 
above, and there is no reason to sup- 
pose that the proportion of errors will 
be changed.* Since the degree of 
structuring is not changed, there will 


also 


* As R. L. Thorndike has pointed out to the 
author, in extreme reduction of the sample size 
there will be an appreciable reduction in the 
proportion of errors because of the fact that 
the responses among which we are looking for 
disagreement are also those which determine 
the scale values of the respondents. This 
spurious consistency is probably negligible, 
however, for values of N which are large in 
comparison to the number of possible scale 
values, which is equal to the number of items 
plus one. 
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be no change in the associated entropy 
value. 

Thus it seems reasonable to 
sume that the mean of the distribu- 
tion function of r is not appreciably 
affected by moderate changes in JN, 
although the standard error of r ob- 
viously will be. When comparing 
two r’s based on different sample 
sizes, we need not make any allow- 
ance for different entropy values; we 
need only keep in mind that the 
smaller the N, the larger the stand- 
ard error of r. As present theory does 
not enable us to write the expression 
for this standard error, we can be 
safe only by avoiding sample sizes 
below the accepted standard of 100. 

Table 1 gives the entropy values 
associated with dichotomous items 
for various differences between the 
frequency — proportions, 


as- 


response 


TABLE 1 


ENTROPY VALUES ASSOCIATED WitH DicHoT- 
OMOUS ITEMS FOR VARIOUS DE 
GREES OF RESPONSE BIAS 


Item Entropy 


1.0000 
9928 
.9709 
9341 
8823 
8113 
.7219 
6098 
4690 

2864 
0000 


More extensive tables of values for 
—Xp- logs: p have been published (8), 
or if necessary, a table of common 
logarithms may be used to compute 
logsp values by use of the following 
relationship: 


loge p = logic p/logio 2= 3.3219: logio p. 
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NOTE ON SCORE TRANSFORMATION AND 
NONPARAMETRIC STATISTICS 


WALTER ( 


>. STANLEY 


Brown University 


It is becoming increasingly popu- 
lar to assess the statistical significance 
of markedly skewed data (e.g., la- 
tency of running or 


responses in 


bar-pressing 
means ol 


rats) by non- 
parametric statistics. This is not sur 
prising in view of the marked ease of 


calculation. However, one source of 


The smaller of the two sums of ranks 
of like signs is therefore 8, and the 
difference is not significant, a sum of 
4 being required for a p of .05. 

In the right half of Table 1, the 
same statistical test is carried out on 
the differences in the reciprocals of 


these scores. Again one rank is 


TABLE 1 


ILUUSTRATIVE DATA 


NALYZED IN T we 


» Ways with WILCOxoON’s PAIRED 


REPLICATES TEST 


ho 


1 
2 
3 
} 
5 
5 
2 
7 


working directly with 
than with 
—may sometimes be a 
This could be the 
working with ranks of 
between paired 


labor saving 
raw data 


formed scores 


rather trans- 
mixed blessing. 
case when 
differences 
Here some 
would alter 
io necanraae « | hi heir r 7 
differences and thus their ranks. 
Consider Wilcoxon's (1) test for 
paired replicates applied to the raw 
Table 1. 


be latencies in seconds of a running 


SCOres, 
transformations 
the 


score 


the magnitudes of 


scores in These scores could 


response in rats obtained under two 
The 


difference is in the negative dire¢ 


conditions of extinction. largest 


tion 


Reciprocals 


Cond. A Cond. B Diff. 


1.00 : .67 
50 , 38 
33 ; 19 
25 ; 14 
20 : 03 
04 : .02 
.50 ; .30 
14 .07 


assigned a negative sign, but here this 
rank is 1, and the p obtained is be- 
tween .02 and .01. 

Clearly must 
be given to the meaningfulness of 
scale units, and more generally, to 
the population of values to which 
one wishes to generalize in order to 
take full advantage of this “rapid 
approximate”’ statistical technique. 


some consideration 
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Oscoop, CHARLES E. Method and 
theory 1n experimental psychology. 
New York: Oxford Univer. Press, 
1953. Pp. vii+800. $10.00 ($14.00 
trade edition). 


According to the preface, this book 
was written “to provide undergradu- 
ate majors and graduate students in 
psychology with a text that evaluates 
experimental literature in close rela- 
tion to critical theoretical issues.”’ 
lor such use it has several virtues. 
Not only are the theoretical issues 
about which the book is centered im- 
portant in their own right, but they 
also provide schemata that should 
help the student to comprehend and 
retain the myriad facts of experi- 
mental research. Osgood’s descrip- 
tion of many experiments is so mi- 
nute as to be the next best thing to 
the originals themselves. Further- 
more, within its range, the book re- 
flects with marvelous faithfulness the 
problems and atmosphere of experi- 
mental psychology as this is repre- 
sented in the latest volumes of Ameri- 
can psychological journals. 

This timeliness is apparently not 
due to unusual recency in the cita- 
tions. A sample of 100 was drawn 
from the 1,290 references in the 
bibliography; when these were dis- 
tributed by publication date, their 
median date was found to be 1937. A 
similar sample of 100 from the ap- 
proximately 1,800 references | in 
Woodworth’s Experimental Psychol- 
ogy of 1938 (with which Osgood’s 
book is bound to be compared) 
vielded a median date of 1922. Thus, 
the median ages of the citations at 
the time of the publication of the 
two books were both (Is 
there a law here?) Woodworth’s 
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16 vears. 


references are, however, more skewed 
toward early dates than Osgood’s. 

On the other hand, the book is 
both massive and difficult. Two cer 
three pages on vacuum tube ampli- 
fiers and cathode ray oscillographs 
will be insufficient for the student who 
is untrained in physics and scarcely 
needed for one so trained. There is 
a bit of mathematics here and there, 
but probably not enough to frighten 
the bright and well-motivated stu- 
dents for whom the book must have 
been written. The first 300 pages pre- 
suppose rather extensive acquaint- 
ance with neural anatomy and neu- 
rophysiology; a few anatomical 
drawings, including maps of cortical 
cytoarchitecture, would have been 
helpful here. 

Then, too, for all of its 727 pages 
of text, the book is limited in scope. 
The author says that he has covered 
“...the major part of what is 
called experimental psychology, in- 
cluding sections on sensory processes, 
perception, learning, and symbolic 
processes.” In 1938, Woodworth 


had chapters on these topics plus 
three on feeling and emotion and one 


each on “experimental esthetics,” 
GSR, reaction time, attention, and 
reading. The lack of material on 
motivation and action, which are 
treated at least in part by such a 
recent boak as Stevens’ Ilandbook, as 
well as other topics that other readers 
will no doubt miss, will be felt keenly 
in any broadly conceived proseminar 
in which Osgood’s book is used. It is 
also likely to raise the tiresome ques- 
tion, what 1s experimental psychol- 
ogy? 

The word ‘‘method”’ appears in the 
title but not as a heading in the sub- 
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ject index, and it can be said that the 
text deals more with method in 
particular than with method in gener- 
al. One extensive evaluation and 
comparison of methods as such does 
appear, namely, that on psychophys- 
ics, but this is not wholly satisfactory. 
The method of average error is de- 
scribed inadequately and, the review- 
er believes, incorrectly. The ‘‘flicker- 
fusion method” is given treatment 
coordinate with that of the standard 
methods even though it must itself 
employ one of these latter methods. 
Further, it appears that the author is 
more worried than he need be over the 
interpretation of the results of the 
two-category method of 

stimull. 

To regard this work merely as a 
textbook or handbook would, how- 
ever, overlook its most significant as- 
pirations, which have to do with 
the evaluation of theory and the 
maintenance, in the author’s words, of 
“a certain continuity of approach.” 
Now, it is notorious that experiment- 
al psychology does not easily submit 
to systematic unification, and we 
regretfully judge that this proposi- 
tion is not threatened by the present 
book. The theoretical atmosphere of 
the first six chapters, on sensory and 
perceptual processes, is noticeably 
different from that of the remaining 
ten, on perceptual dynamics, learn- 
ing, thinking, and language. Osgood’s 
Hullian views have no relevance to 
such matters as the cortical basis for 
sensory quality and intensity, the 
quantal hypothesis in audition, and 
the significance of visual adaptation. 
What little integration is achieved 
for sensory psychophysics and psy- 
chophysiology comes from the au- 
thor’s interest in the mode of action 
of the cerebral cortex, where he 
prefers neurostatistical conceptions 
to the ‘“‘dynamics” of Kohler. Hel- 


constant 


519 


son's conception of adaptation level, 
which might have been used to link 
together some problems that remain 
unconnected, is not mentioned. 

Beginning, however, with the chap- 
ter on central factors in perception 
and continuing through most of the 
remainder of the book, consideration 
is given to major theorists. Tolman, 
Guthrie, the gestalt psychologists, 
and Hull are discussed extensively, 
and briefer accounts are given of 
Hebb, Skinner, and others. It is ob- 
vious that, in most instances, Osgood 
has made intense efforts to under- 
stand and to present sympathetically 
opinions with which he partly or 
wholly disagrees, as well as to exhibit 
in full relief the weaknesses of those 
by which he takes his own stand. 
Hiow successful he has been in the 
exposition of each of these views will 
certainly be judged differently by 
readers of different theoretical pre- 
dispositions. The reviewer's belief 
is that the systemsof Hull (up through 
Principles of Behavior—the later 
books were too recent to be included), 
Guthrie, and Tolman are discussed in 
such a way as to give their followers 
little cause for complaint; indeed, 
Osgood has constructed challenging 
formalizations of the last two, with 
the aid of Voeks in Guthrie's case. 
Gestalt theory is more alien to him, 
and he makes the mistake of writing 
as if Kohler, Wertheimer, Koffka, 
Lewin, and J. F. Brown adhered to 
one system in common. 

Three pages are allotted to Hebb 
as an exponent of physiological the- 
ory. Unfortunately, the more purely 
psychological aspects of Hebb’s ideas 
are lost sight of, so that, for exam- 
ple, his contributions to the the- 
ory of thinking are not referred to 
in Osgood's chapters on this topic. 
Least satisfactory is the treatment 
accorded to Skinner. His views on 
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the role of theory in psychology are 
omitted, and some of the remarkable 
regularities that he has established, 
which are clearly of concern to a 
learning theorist, are given little or no 
attention. This is true, for example, 
of the effects of different reinforce- 
ment schedules. Periodic reinforce- 
ment is mentioned in passing, with 
the comment that ‘The mechanism 
whereby an animal smoothly adjusts 
its rate of response under these condi 
tions is unknown.”’ “Intermittent 
reinforcement” is discussed more 
thoroughly, but Skinner's name is not 
mentioned in connection with it. The 
same is true of the problem of the 
number of kinds of learning. <A 
rather misleading statement (p. 307) 
could cause the reader to suppose 
that Skinner accepts the concept of 
disinhibition. 

Osgood’s own ‘‘mediation hypothe- 
sis,’ by which he hopes to effect a 
rapprochement between ‘‘reinforce- 
ment’ and ‘cognitive’ theories, is 
not a hypothesis in the ordinary 
sense, but rather a broad extension of 
Hull’s concepts of anticipatory goal 
responses and their associated pro- 
prioceptive stimuli. The following 
statement from the chapter on lan- 
guage behavior exemplifies one of the 
functions attributed to mediators, 
that of serving as signs: 

.a pattern of stimulation which is not 
the object is a sign of the object if it evokes 
in the organism a mediating reaction, this (a) 
being some fractional part of the total be- 
havior elicited by the object and (b) producing 
distinctive self-stimulation that mediates re- 
sponses which would not occur without the 
previous association of nonobject and object 
patterns of stimulation (p. 696). 

Such mediated self-stimulation has 
cue, motivating, and reinforcing 
properties. Mediators are not neces- 
sarily peripheral but may be purely 
cortical events; they are said to have 
the status of ‘hypothetical con- 
structs.”” The following comments 
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are made by the reviewer, with the 
caveat that they should be taken as 
no more than fragments of a possible 
critical analysis of the hypothesis. 
(a2) The mediators are so substitut- 
able for the ‘‘ideas’’ of association 
theory that occasionally the discus- 
sion sounds like a mere translation of 
such theory into Hullian language. It 
would be no defense to attack cogni- 
tive theory on the same grounds. (d) 
While the evidence submitted in fa- 
vor of the hypothesis seems to indi- 
cate that some sort of mediation is 
involved in the behaviors that are de- 
scribed, it does not, in the reviewer's 
opinion, discriminate clearly in favor 
of fractional goal responses. (c) The 
explanations by mediators have an 
essentially chainlike character, and 
are open to all of the objections that 
have been raised against such a con- 
ception since Dewey’s paper on the 
reflex arc. (d) The attribution to me- 
diators of both motivating and rein- 
forcing properties leads to the same 
extreme difficulties that leave the 
stimulus-reduction theory of rein- 
forcement hanging by a thread on 
page 443. (e) One would like more 
evidence that the hypothesis can pre- 
dict new and unexpected phenomena. 

On questions of philosophy of sci- 
ence and theory of knowledge the 
book is disconcertingly artless, and 
the thought is more energetic than 
subtle. Although Osgood calls him- 
self a materialist, and says that ‘‘Be- 
haviorally, ... the environment is a 
pattern of neural energies in the cen- 
tral nervous system,” and that, when 
one touches something, ‘“The aware- 
ness of sensation is not...in the 
fingertips ... but in the brain,’”’ he 
nevertheless wrestles with the ghosts 
of introspectionism without van- 
quishing them, since they remain to 
haunt many pages of the book. At the 
same time, he shows no great sensi- 
tivity to the problems of earlier days; 
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e.g., almost anything that one might 
say about oneself is called ‘‘introspec- 
tion.” For examples from other con- 
texts, Lloyd Morgan’s canon is 
treated as if it were subject to experi- 
mental test; and the discussion of 
repression leads to the methodologi- 
cally awkward conclusion that moti- 
vated forgetting ‘‘... is certainly a 
valid observation in the clinic and 
probably would be verified in the 
laboratory. ..." 

So ambitious a work can be ex- 
pected to contain some inaccurate 
and debatable matter. Thus, the all- 
or-none law is so stated as to make it 
depend upon the full utilization of 
“materials” in the neuron; in con- 
nection with the Talbot-Plateau law, 
it is asserted inaccurately that 
ms . with a light-dark ratio of one 
to one, the fused field will appear one- 
half as bright as the illuminated sec- 
tor’; the color solid, which some 


might regard as a minor triumph of 


taxonomy, is dismissed as a mere 
pedagogical device, apparently on the 
incorrect supposition that it was in- 
tended to represent the facts of color 
mixture; it is said that interference 
theo:ies of extinction never specify 
the interfering responses, whereas a 
number of such responses were identi- 
fied two pages earlier in the descrip- 
tion of research by Wendt; a refer- 
ence to work by Birenbaum and Zei- 
garnik leaves the impression that 
Lewinian theory presumes that 
boundaries between regions within 
the person are less permeable in the 
child than in the adult, when in fact 
the theory asserts the reverse. 

The book is seriously marred by 
numerous errors of grammar, spelling, 
and typography, as well as other 
blemishes of which some are matters 
of taste. It is regrettable that most 
of these were not weeded out by the 
publisher's editing. <A few 
laneous examples follow. 


miscel- 
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The names Dashiell, Dimmick, and Bogus- 
lauvsky are misspelled; Prdgnanz is spelled 
pregninz; the plural of plexus is given as 
plexi; e.g. is almost uniformly given the 
meaning that ts, and similar confusions occur 
in the use of t.e., cf., and viz.; to refute is taken 
to mean no more than to contest; etc. Of 
typographical errors, perhaps the most 
troublesome is the printing of a negative ex- 
ponent in an exponential expression as a sub- 
traction (p. 357). The table titles and figure 
captions are generally too sparing, and the 
text does not always refer fully, or even ac- 
curately, to the figures. Thus, the text refers 
to dashed and solid lines in Fig. 123 but only 
the latter are found; Fig. 5 is too sketchy 
for the text; hair cells are mentioned in the 
text but not identified in Fig. 5 and Fig. 24; 
the abscissa of Fig. 63 should be labeled 
millimicrons; more labels are needed for Fig. 
207; the text refers to Fig. 177 when Fig. 176 
is intended. (On the other hand, many of the 
figures are excellent, as, e.g., Fig. 89 and 
101.) It may also be mentioned that the 
type seems very small for the length of line 
on the large single-column page. 


The foregoing, however distress- 
ing, are minutiae, while Osgood's 
accomplishments are not. Scattered 
throughout the book are original 
theoretical and experimental contri- 
butions on topics ranging from color 
contrast to transfer and retroaction in 
learning. Several chapters are unusu- 
ally interesting, e.g., Chapter 7 on 
perception and Chapter 14 on prob- 
lem solving. Furthermore, the author 
is at his best in assembling the evi- 
dence for and against testable research 
hypotheses and doggedly following 
the trail of the experiments even when 
they lead in an undesired direction; 
and this is, after all, his principal 
intention. 

FRANCIS W. IRWIN. 

University of Pennsylvania. 


HAvicuurst, Ropert, J., & AL- 
BRECHT, Rutu. Older people. New 
York: Longmans, Green, 1953. 
Pp. xvi+415. $5.00. 


This book is an admirable antidote 
to the prophets of doom and gloom 
who identify aging with senility, disa- 
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bility, and frustration. It is based on 
a detailed survey of a small midwest- 
ern city (7,000 in the town itself and 
4,000 in the adjacent trading area) 
and presents factual data about what 
the older people there are like (670 of 
them 65 or older). Individual inter- 
views were carried out with a sample 
of 100 of the old people drawn to 
represent the entire old age popula- 
tion by sex, socioeconomic status, and 
marital status. 

The book, however, is more than a 
compilation of factual information. 
It adds a conceptual framework for 
the analysis of the needs and desires 
of individual aging people, an analy- 
sis of the community’s reaction to the 


aged, and a formulation of basic 


principles essential to the effective 
planning of programs for the aged. 
Writing in an easy, flowing style, 
with enough ancecdotal material to 
maintain a lively interest, the authors 
manage to impart a great deal of fac- 


tual information without the reader's 
being aware of it. 

The book should be general in its 
appeal, not only to those who are 
especially concerned with the prob- 
lems of aging, but also to the aging 
people themselves. Thus, the physi- 
cian who may be exposed daily to 
the aches and pains of older people 
will be agreeably surprised to learn 
that 79 per cent of the oldsters in this 
cultural environment regarded them- 
selves as “healthy’’ and that only 6 
per cent were homebound and 2 per 
cent actually bedridden. The social 
worker, who must deal with the prob- 
lem of finances, living arrangements, 
etc. among the aged, ought to know 
that 43 per cent of this population 
reported they were happily situated 
and that only one-fourth were actu- 
ally unhappy. It is also important to 
know that happiness was not related 
to economic status and that health 
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became a major component of happi- 
ness only in the 20 per cent of the 
population who had specific com- 
plaints; i.e., old age was what the in- 
dividual made it, with economic and 
health aspects assuming a secondary 
role. All but 4 per cent of the popula- 
tion reported that they had enough 
money to get along on provided they 
had no unusual medical expenses. 
Anxiety about what they would do in 
case of protracted illness requiring 
hospitalization was present in a large 
proportion of the elderly people. 
There was no evidence that the com- 
munity rejected old people because of 
their age. However, the importance 
of the role of the individual in the 
community was emphasized. In gen- 
eral, the community seemed quite per- 
missive about what it expects of older 
people and looked with favor on con- 
tinued activity. 

The authors present a number of 
concepts that may require further 
analysis. For example, their outline 
of the meaning of work shows the 
need for a variety of retirement plans 
if we are to meet the needs both of 
the individual and of society. The 
concept of socioeconomic mobility is 
applied to the psychological adjust- 
ments of aging people in an inter- 
esting manner. Whether all of the 
conceptual formulations will prove 
useful remains to be seen, but the au- 
thors are to be complimented on their 
willingness to theorize and thus re- 
duce a mass of specific observations to 
some intelligible formulation. 

Psychologists with a quantitative 
psychometric viewpoint will be dis- 
appointed by the use of nonscaled 
questionnaire techniques, and sociolo- 
gists may be disturbed by the inclu- 
sion of a chapter on ‘A Personal and 
Social Philosophy of Old Age”’ which 
makes specific recommendations 
about “‘rational defenses” that can be 
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used by older people. One might also 
quibble about whether the popula- 
tion of Prairie City is representative 
of the entire United States. How- 
ever, the solid facts of how one group 
of aging people behaved in a definable 
environment have been admirably 
explored by this study. Only by simi- 
lar studies in different community 
environments can the generality of 
the findings be settled. 

It is good to know that in some 
community structures a large propor- 
tion of the older people can optimisti- 
cally meet 


ng. 
I 


the problems of agi 
This is a book that should be read 


\ 


everyone interested in the welfare of 
his own community. 
N. W. Snock. 
Baltimore City Hospitals. 


FEDERN, PAuL. Ego psychology and 
the psychoses. (E. Weiss, Ed.) New 
York: Basic Books, 1953. Pp. 375. 
$6.00. 


The editor had the difficult task of 
organizing the presentation of a sys- 
tematic theory of ego functioning out 
of another man’s half-century (1901- 
1952) of prolific publishing (almost 
100 articles, lectures, etc.). The 40- 
year association as pupil and friend 
of Paul Federn provided important 
hackground for this task. In the 21- 
page introduction, Weiss attempts to 
establish Paul orthodox 
lovalty to Dr. Freud (to take care of 
the ‘‘minor”’ differences in the area of 


kedern’s 


ego theory), to apologize for Federn’s 
difficulties in exposition which is ‘‘ 
very rich in content but... 
complex etc.”’ (p. 21), and to present 
a condensed, clarifying guide to the 


often 


metapsychological wilderness of the 
text. He apparently did not feel that 
Federn could provide in 16 articles 
and 340 pages a sufficiently clear 
presentation of his thoughts, but un- 
fortunately the guide itself needs a 
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guide. The first five ‘theoretical 
papers’ are a maze of unclear verbal 
gymnastics, poorly defined concepts, 
and repeated — self-adulation by 
ledern which does not sufficiently re- 
ward the reader for his difficult labor 
of seeking meaning. In this part of 
the book, Federn shares a difficulty 
common to many psychoanalytic au- 
thors who consider operationalism a 
primitive scientific procedure 
prefer an artistic verbal medium. 
The second part of the book (nine 
papers) is a fuller measure of Dr. 
Federn’s contribution. Here, his rich 
clinical experience and observational 


and 


astuteness appear to good advantage. 
A conception of the ego differing in 
many respects from that postulated 
by Freud is presented, ‘‘Ego Feeling” 
seems to be viewed as a basic energy 
which integrates ego functioning and 
makes ego experience possible. An 
interesting hypothesis suggesting a 
somatic and mental differentiation of 
ego feeling is developed. There is a 
much-repeated emphasis on psychosis 
as an ego-defeat phenomenon, as 
against the neurotic dynamics of ego 
defense The so-called ‘“Steinach 
effect’’ is defended as a_ sufficient 
the sterilization of latent 
psychotics and psychotics. However, 


basis for 


the psychological rationale given in 
ledern’s papers hardly constitutes a 
convincing argument, 

This book is not the ‘‘Vade Medi- 
cum” for the treatment of psychosis 
which the publishers enthusiastically 
proclaim on the book-jacket blurb. It 
is certainly not recommended for 
reading. However, if the 
reader is familiar with psychoanalytic 
terminology and has the perseverance 
to follow Federn through “thick and 
thick,”’ then he may find in the papers 
many provocative speculations based 
on extensive clinical experience. 
There might even be the reward of 


general 
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deriving fruitful hypotheses for ex- 
perimental investigation from some of 
these speculations. 
M. ERIK WRIGHT. 
Umiversity of Kansas. 


McCurpy, Haro_tp Grier. The 
personality of Shakespeare. New 
Haven: Yale Univer. Press, 1953. 
Pp. xi+243. $5.00. 

Almost twenty years ago Caroline 
Spurgeon published the results of her 
extensive study of Shakespeare’s 
figures of speech from which she 
drew a number of inferences concern- 
ing Shakespeare’s interests, preoccu- 
pations, attachments, personality 
traits, personality development, and 
the like.’ Her study was based upon 
the premise that a person reveals 
things about his personality by the 
kinds of metaphors he uses. 

McCurdy has tried to get at the 
personality of Shakespeare by a dif- 
ferent route. He has counted the 
number of lines that every character 
in every play speaks in order to de- 
termine the principal characters in 
each play. The assumption is made 
that the more important features of 
Shakespeare’s personality are repre- 
sented in the principal characters. In 
addition to character analysis, Mc- 
Curdy has analyzed some of the ma- 
jor themes that run through a num- 
ber of plays much as one would 
analyze a TAT protocol. 

For anyone who admires Shake- 
speare and who feels that the psy- 
chologist has something to offer in the 
way of method for shedding more 
light upon the kind of person that 
Shakespeare must have been in order 
to write as he did, this is a fascinating 
study. Dr. McCurdy is dedicated to 
his subject and writes with sensi- 
tivity and critical devotion. 

Most psychologists do not quarrel 


1 Shakespeare's imagery and what it tells us. 
New York: Macmillan, 1935. 
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with the hypothesis that a writer 
projects himself into his writings, 
and that it should he possible there- 
fore to make a personality analysis of 
a writer from his writings alone. The 
same hypothesis probably holds for 
painters, composers, architects, de- 
signers, and anyone who produces 
something out of his imagination. 
However, Shakespeare is the worst 
possible subject on whom to test the 
hypothesis. Virtually nothing is 
known about the man. In fact, so 
little is known that a number of 
people believe that someone else 
must have written the plays at- 
tributed to Shakespeare. Such being 
the case, how can one ever hope to 
confirm or infirm the inferences that 
are made about the personality of 
Shakespeare from his writings? 

I wish that Dr. McCurdy had de- 
voted his considerable talents both as 
a psychologist and as a literary critic 
to the examination of a writer about 
whom a great deal is known. Hem- 
ingway or Mickey Spillane would 
make excellent choices. Is it true, for 
instance, that Mike Hammer repre- 
sents the shadow-side of Spillane’s 
personality? How much of the old 
man of the sea is to be found in 
Hemingway? McCurdy’s methods of 
of analysis would undoubtedly strike 
pay dirt and reveal a great deal about 
the nature of projection if he used 
them on more appropriate subjects. 

CALVIN S. HALL. 

Western Reserve University. 


STOLUROW, LAWRENCE M. (Ed.) 
Readings in learning. New York: 
Prentice-Hall, 1953. Pp. viii+555. 
$6.00. 

Stolurow has collected 42 biblio- 
graphical items from the field of 
learning, either in whole or in part, 
and by combining three into one unit 
and two into another, has presented 
39 articles under one cover. His pur- 
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pose was to provide a sample of origi- 
nal material to supplement secondary 
source materials since ‘‘only by read- 
ing and analysing original reports can 
the student learn how to conduct a 
variety of different types of research 
and at the same time become aware of 
the problems and _ difficulties in- 
volved.”’ Presumably this book was 
intended to serve as a second text in 
a course in theories of learning or per- 
haps as a source book in a course in 
which no text was used, although 
Stolurow suggests that, ‘‘Where a 
laboratory is available, this volume 
could serve as a laboratory text. The 
experimental studies could be used as 
models for both experiments and 
report writing, and the theoretical 
articles as bases for new studies.” 

It is a fairly safe estimate that at 
least 5,000 bibliographical items have 
been published in the past 70 years 
from which Stolurow chose 42. His 
selections could well be analyzed in 
terms of whether they are essentially 
systematic as opposed to experimen- 
tal articles, in terms of how well they 
represent the various problem areas 
that have been of concern to people 
interested in learning, in terms of how 
well various theoretical points of view 
are represented, and in terms of 
whether these particular items are in 
fact good models. 

The book is divided into eight sec- 
tions or chapters. The first of these, 
titled ‘‘Some Systematic Positions,” 
contains six units representing eight 
of the 42 articles selected. These 
eight and seven others in later chap- 
ters, something over 35 per cent of the 
selections, are essentially nonexperi- 
mental. In addition, ‘‘Every chapter 
and article contains some theory. 
This is a sign of the times.”’ Thus the 
book is very heavily weighted in favor 
of theory and its exposition, although 
it does contain 27 detailed research 
reports. 
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Stolurow’s selections are not repre- 
sentative of points of view. He re- 
ports that he based his selections on 
five points of quality and the addi- 
tional criteria that selections should 
be recent rather than old, human 
rather than animal, and S-R rather 
than gestalt. His chapter on system- 
atic ‘positions has selections from 
Thorndike, Hull, Guthrie, Estes, and 
Skinner, representing S-R_ positions, 
and Tolman, who is somewhat diffi- 
cult to classify. The other systematic 
or nonexperimental articles are by 
Hull, Mowrer, Pavlov, Skinner, 
Spence, Tolman, and Woodworth. 

Aside from the S-R bias, which is 
confessed and deliberate, the nature 
of the selection process is partially 
revealed by who is omitted. There 
are, by rough count, 947 references 
cited in this list of 39 items. Thirteen 
men have ten or more references on 
this list. Stolurow has selected arti- 
cles from eight of these men, but has 
not selected any from the writings of 
Hilgard, Krech (Krechevsky. in the 
bibliography), McGeoch, Maier, or 
Neal Miller. Each has published at 
least “‘classic’’ article in the 
period from which these items were 
selected. Twenty-two were published 
between 1947 and 1950 and the re- 
maining 20 hetween 1928 and 1946. 

The 27 articles which report experi- 
ments in detail are heavily weigated 
in favor of experiments involving hu- 
man subjects (18) as against those 
involving animals (9). This weight- 
ing follows largely from the choices 
of subject matter represented. There 
are two experimental articles on con- 
ditioning concepts and_ techniques, 
four on motivation and _ reinforce- 
ment, six on motor and verbal learn- 
ing, three on discrimination and 
perceptual learning, two on educa- 
tional and social learning, five on re- 
tention and forgetting, and five on 
transfer. 


one 
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As models of research design, there 
can be little quarrel with Stolurow’s 
selections. They are some of the 
better examples of the kind of article 
usually found in the Psychological 
Review, the Journal of Experimental 
Psychology, and the Journal of Com- 
parative and Physiological Psychology, 
from which most of his selections 
were taken. They are also typical of 
the writing style of these journals 
and are good models if one wishes to 
perpetuate this style. If one wishes 
to teach a more intelligible style, 
some of these selections can be used 
as examples of what not to do. 

The book is printed by a photo- 
offset process. The right-hand mar- 


gins of the pages are not justified. 
The paper used is sufficiently trans- 
parent that printing from the back 
of the page and from the next page 
shows through. The binding is a hard 
cover, but it is covered with paper 
rather than cloth and the spine of my 


copy is broken on both sides from the 
handling it has received in preparing 
this review. It is a shabby job of 
book making and, from this stand- 
point, overpriced by a very wide 
margin. 
EDWARD L. WALKER. 
University of Michigan. 


PSYCHOTHERAPY RESEARCH GROUP, 
PENNSYLVANIA STATE COLLEGE, 
(Wm. U. Snyder, Chairman). 
Group report of a program of re- 
search in psychotherapy. State Col- 
lege: Pennsylvania State Coll. 
Press, 1953. Pp. 1ii+179. $2.25. 
Regardless of divergent viewpoints 

about the merits of client-centered 

counseling, much credit is due Rogers 
and his students for their pioneering 
and ingenious research with ver- 
batim interview recordings. This 
report is part of a sequence which 
began little more than a decade ago, 
but it already represents a second 
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generation effort, being the col- 
laborative production of nine doc- 
toral students directed by William 
U. Snyder. The report consists of a 
description of the research program, 
nine thesis condensations, a summary 
and discussion by Dr. Snyder, a bibli- 
ography, and a detailed appendix 
that provides the rating and coding 
procedures utilized in the studies. 

The basic sample of cases studied 
was 100 student counseleesfrom which 
maximum N samples were drawn 
depending upon various selection cri- 
teria, e.g., number of interviews, 
transcribability of interviews, tests 
taken, etc. In addition to analyses 
based upon the recorded interviews, 
use was made of pre- and postcounsel- 
ing tests. These tests were the Ror- 
schach, the MMPI, and the Mooney 
Problem Check List. Further data 
were provided by ratings made by 
clients, counselors, and independent 
judges using such devices as a post- 
counseling client scale, a therapist 
personality scale, and a posttherapy 
counselor check list. Somewhat sur- 
prisingly in view of recent trends, no 
Q sorts were employed. 

Among the topics investigated were 
the reasons for early dropouts, the 
relationships between counselor and 
client characteristics, the develop- 
ment of a composite criterion for 
measuring client progress, the pre- 
dictability of client verbal behavior 
during counseling, indices of resist- 
ance, and comparison of the charac- 
teristics of more and less successful 
cases. 

A number of new hypotheses and 
variables have been introduced in 
these studies and, despite the fact 
that most of the findings are incon- 
clusive, the report is an important 
contribution to research methodol- 
ogy. Many readers will doubtless be 
disturbed by the continuing defense 
offered for the position that the effec- 
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tiveness of therapy can be evaluated 
without the necessity of external cri- 
teria. 
LEONARD S. KOGAN. 
Institute of Welfare Research, 
Community Service Society of 
New York. 


STEPHENSON, W. The study of behav- 
tor. Chicago: Univer. of Chicago 
Press, 1953. Pp. ix+376. $7.50. 
If Stephenson had set forth his 

purposes in writing this book, which 

carries the subtitle ‘“Q-technique and 
its methodology,” the reviewer's task 
would be easier. Our inference is 

that his aims were the following: (a) 

to challenge much of current method- 

ology in psychology, (b) to explain 

()-methodology and (c) to show by 

illustration how it can put psychol- 

oyy's ‘“‘house in scientific order,”’ and 

(d) to demonstrate that theory test- 

ing and scientific conclusions are pos- 

sible on the basis of a single case. 

Perhaps when the author speaks of 

the “platform upon which we are to 

campaign,” he is telling us that his 
sole aim is to promote Q-methodol- 
ogy. 

We shall not attempt to list here 
all of the concepts and all of the 
people against which and whom 
Stephenson arrays himself for battle. 
He does not have any faith in ordi- 
nary factor analysis (R-methodol- 
ogy), In measurement, in norms, in 
large samples, or in any so-called 
generalizations springing therefrom. 
We admits that he alone is “‘in step 
and all others out” (p. 348), but this 
does not keep him from citing what- 
ever supporting fragments he can 
find, whether these be found in the 
writings of J. R. Kantor or of J. M. 
Keynes or of some very obscure per- 
son. His sallies, courageously set 
forth, will be found either interesting 
or irritating, according to the pro- 
clivity of the reader 
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With regard to purpose b, one would 
expect that an author who complains 
because such intellects as Godfrey 
Thomson and Cyril Bu.t have mis- 
understood his writing would make a 
special effort at clear and concise 
exposition. Instead, we find a poorly 
organized, piecemeal presentation, 
more confusing than enlightening. 
Thus, we can anticipate continuing 
misunderstanding, and consequent 
misuse of Q-methodology. 

The second half of the 
devoted mainly to applications of Q- 
methodology in the areas of type psy- 


book is 


chology, questionnaire analysis, so- 
cial psychology, self-psychology, per- 
sonality, projective tests, and clinical 
psychology (a chapter to each area). 
We are told that “Q-technique has 
its applications in almost every nook 
and cranny of psychology in its re- 
search aspects” (p. 338) and ‘‘in every 
branch of psychology where behavior 
is at issue’”’ (p. 343). If we are to 
judge from the given illustrative ap- 
plications, the quoted claims repre- 
sent wishful thinking. 

Clinicians and others who, by 
necessity or by choice, deal with a 
single case will find comforting re- 
assurance and be motivated to read 
further by ‘We are to work, instead, 
with a single person, at the call of a 
theory. Yet we shall reach valid, 
scientific conclusions” (p. 5). The 
the merits of 
the single case leads ultimately to ‘‘In 
principle, one may work scientifically 
for a lifetime with a single case” (p. 
343). Unfortunately, by the time 
one has spent a lifetime developing a 
set of principles for predicting (or 
explaining) every fragment of be- 
havior of a single case, the subject 
will have ceased to behave. Or 
another legical conclusion to this sort 
of thing is that psychologists must 
develop two and a half billion ‘‘sci- 
ences’ to explain the behavior of the 


continuous stress on 
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two and a half billion human inhabi- 
tants on this planet! What of the 
task of animal psychologists? 

The fact that our comments have 
been restricted to general reactions 
should not be misconstrued as indi- 
cating that the margins of the re- 
viewer's copy of the book are free of 
specific questions. Far from it. 

QUINN MCNEMAR. 

Stanford University. 


(GOLDHAMER, HERBERT, & MARSHALL 
ANDREW W. Psychosis and civiliza- 
tion. Glencoe, Ill.: The Free Press, 
1953. Pp. 126. $4.00. 


This slim volume consists of a re- 
print of two statistical studies in 
the frequency of mental disease. The 
first and more significant investiga- 
tion is concerned with an analysis of 
admissions to mental hospitals in 
Massachusetts and Oneida County, 
New York, extending back to 1840. 
Consistent with data previously re- 
ported by others for the past half 
century, the present authors found 
that for age groups under 50 there 
has been no increase in the frequency 
of psychoses over the past 100 years. 
This finding should put an end to 
the recurring myth that psychoses 
are a product of the stress and strain 
of modern life. The second paper 
presents expectancy rates of mental 
disease. It differs from earlier studies 
in that the tables prepared state the 
risk of admission to a mental hospital 
between any two points of an indi- 
vidual’s life. This method of presen- 
tation serves to accentuate the high 
incidence of admissions for the older 
age group. If a 60-year-old male sur- 
vives to age 85 he runs a 10 per cent 
risk of being admitted to a mental 
hospital. 

James D. PAGE. 

Temple University. 
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FRENCH, THOMAS M. The integration 
of behavior. Vol. I]. The integrative 
process in dreams. Chicago: Univer. 
of Chicago Press, 1954. Pp. xi 
+367. $6.50. 

When the history of twentieth cen- 
tury psychology is written, it will un- 
doubtedly be characterized as the 
period when two great streams of 
psychological thought, one flowing 
from the laboratory, the other from 
the clinic, converged and ran to- 
gether to form a unified science of 
dynamic psychology. The future 
historian will observe that the dialec- 
tical process which eventuated in the 
synthesis of experimental and clinical 
psychology took a long time and had 
to overcome many obstacles. Among 
the obstacles that he will discuss, one 
at least is apparent to us today, 
namely the insularity of the pro- 
ponents of each of these major orien- 
tations. This insularity prevents one 
side from communicating with the 
other, so that each remains ignorant 
of what the other stands for. 

Fortunately for psychology there 
are indications that the iron curtain 
of insularity is being lifted and that 
a real exchange and integration of 
ideas are beginning to take place. 
The name-calling era is drawing to 
a close. Psychologists brought up in 
the tradition of experimental psy- 
chology are reading and being insemi- 
nated by psychoanalysis, and psycho- 
analysts, though to a lesser extent, 
are reading and being inseminated 
by experimental psychology. An out- 
standing example of a psychoanalyti- 
cally trained investigator whose 
thinking has been fertilized by inter- 
course with systematic experimental 
psychology is Thomas French, asso- 
ciate director of the Chicago Institute 
of Psychoanalysis. 

In a number of articles published 
during the last 20 years, French has 
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demonstrated his ability to synthe- 
size the two orientations. Now, in 
his impressive work in progress, three 
volumes of which are still to be pub- 
lished, we are privileged to witness 
the culmination of his integrative en- 
deavors. The present volume is an 
application of the basic postulates 
set forth in Volume I to an under- 
standing of dreams. 

Essentially, what French has done 
is to graft Tolman’s cognitive theory 
onto Freud's motivational theory. 
French’s key concept is cognitiv 
structure, by which he means ‘a 
hierarchy of plans for achieving an 
end-goal.”” Implicit in this definition 
is the concept of motive since every 
plan involves striving toward a goal. 

French not only believes that every 
dream has a cognitive structure but 
also that the cognitive structures of 
different dreams of the same person 
form a pattern. The cognitive struc- 


ture of a dream is discovered by using 
various sources of evidence, namely, 
information about the dreamer, free 
associations, translation of symbols 
by functional analysis, and compari- 


sons of one dream with the other 
dreams of a series. All of this infor- 
mation is blended together by em- 
ploying the method of internal con- 
sistency, which is the favorite method 
of psychoanalytic investigators. The 
end result is a comprehensive under- 
standing of the dreamer’s conflicts, 


their relative intensities and inter- 
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relationships, their historical origins 
and contemporary significance, their 
underlying physiological and motiva- 
tional patterns, the dreamer's plans 
and capacities for resolving them, 
and his hopes of success. 

The rhetorical strategy of the book 
consists of discussing the dreams of 
a patient who is undergoing the type 
of psychoanalytic treatment prac- 
ticed at the Chicago Institute of Psy- 
choanalysis. Some readers may ques- 
tion whether the strategy is a success- 
fulone. This reader found that it be- 
came quite tedious to try to follow 
and bear in mind all of the intricacies 
of analyzing the dream series. It 
may well be that every idea is related 
in some way to every other idea, and 
that there is almost an infinity of 
components and levels in the cogni- 
tive structure of a particular human 
mind, but I wonder whether there is 
not a better way to get this acrcss. 
Perhaps not. It seemed to me that 
there was an unconscionable amount 
of redundancy and that more severe 
editing could have made the book 
more readable. French's use of dia- 
grams is an aid to quick understand- 
ing and should have enabled him to 
abbreviate the extended discussions. 

In spite of these literary defects 
the book is a solid contribution to the 
major task of twentieth century psy- 
chology. 

CALVIN S. HALL. 

Western Reserve University. 





BOOKS AND MONOGRAPHS RECEIVED 


ANASTASI, ANNE. Psychological test- 
ing. New York: Macmillan, 1954. 
Pp. v+682. $6.75. 

AUSUBEL, Davib P. Theory and prob- 
lems of adolescent development. New 
York: Grune & Stratton, 1954. 
Pp. xviiit+580. $10.00. 

BreLLows, RoGrer M., & Estep, M. 
FRANCES. Employment psychology: 
the interview. New York: Rinehart, 
1954. Pp. xxi+295. $4.25. 

BRAY, DouGias W.  ZJssues in the 
study of talent. New York: King’s 
Crown 1954. Pp. xi+65. 
$2.00. 

CARMICHAEL, L. (Ed.) Aflanual of 
child psychology. (2nd Ed.) New 
York: Wiley, 1946, 1954. Pp. ix 
+1295. $12.00. 

CuHINEY, Evy. Sociological perspec- 
tive; basic concepts and their appli- 
cation. Garden City: Doubleday, 
1954. Pp. vit58. $8.85. 

CRONBACH, LEE J. Educational psy- 
chology. New York: Harcourt, 


Press, 


Brace, 1954. Pp. viiit+628. $5.50. 


FINK, KENNETH H. Mind and per- 
formance; a comparative study of 
learning in mammals, birds and rep- 
tiles. New York: Vantage, 1954. 
Pp. xi+113. $3.00. 

FINLAY, WILLIAM W., SarTAIN, A. Q., 
& Tate, Wittis M. Human be- 
havior in industry. New York: 
McGraw-Hill, 1954. Pp. xi+247. 
$4.00. 

FRUCHTER, BENJAMIN. Introduction 
to factor analysis. New York: D. 
Van Nostrand, 1954. Pp. v+ 280. 
$5.00. 

DE Forest, [zeETTE. The leaven of 
love. New York: Harper, 1954. 
Pp. xvi+206. $3.50. 

FREUD, SIGMUND. The origins of 
psycho-analysis; letters to Wilhelm 
Fliess, drafts and notes: 1887-1902. 
(Marie Bonaparte, Anna Freud, 
and Ernst Kris, Eds.) New York: 
Basic Books, 1954. Pp. xi+486. 
$6.75. 


530 


FRYER, DouGLas H., et al. General 
psychology. (4th Ed.) New York: 
Barnes & Noble, 1954. Pp. xix+ 
300. $1.50. 

GEDDES, DONALD PoRTER. (Ed.) An 
analysis of the Kinsey reports on 
sexual behavior in the human male 
and female. New York: E. P. Dut- 
ton, 1954. Pp. x +319. $3.50. 

GINZBERG, ELI, et al. Psychiatry and 
military manpower policy. New 
York: King’s Crown Press, 1953. 
Pp. xi+66. $2.00. 

JerstLp, ArtHuUR T. Child psychol- 
ogy. (4th Ed.) New York: Pren- 
tice-Hail, 1954. Pp. v+676. 

Katz, Davip, CARTWRIGHT, Dor- 
WIN, ELDERSVELD, SAMUEL, & 
Ler, ALFRED McCiunG. Public 
opinion and propaganda. New 
York: Dryden, 1954. Pp. xx+779. 
$6.25. 

McGILi, V. J. Emotions and reason. 
Springfield: Charles C Thomas, 
1954. Pp. xiii+122. $3.25. 

Micnat-Smitu, H. (Ed.) Pediatric 
problems in clinical practice; special 
medical and psychological aspects. 
New York: Grune & Stratton, 
1954. Pp. x+310. $5.50. 

PENNINGTON, L. A., & Bera, I. A. 
An introduction to clinical psychol- 
ogy. (2nd Ed.) New York: Ron- 
ald, 1954. Pp. viii+709. $6.50. 

REMMERS, H. H., Rypen, E. R., & 
MorGan, C. L. Introduction to 
educational psychology. New York: 
Harper, 1954. Pp. ix+435. $4.00. 

Runes, DaGosert D. Letters to my 
daughter. New York: Philosophical 
Library, 1954. Pp. 131. $2.50. 

Wu treck, J. W., & Bennett, FE. M. 
The language of dynamic psychol- 
ogy; as related to motivation re- 
search. New York: McGraw-Hill, 
1954. Pp. 111. $4.00. 

ZILBOORG, G. The psychology of the 
criminal act and punishment. New 
York: Harcourt, Brace, 1954. Pp. 
xi+141. $3.50. 








FUNDAMENTALS of PSYCHOANALYTIC TECHNIQUE 
By the late Trygve Braatoy, formerly of Ulleval General Hospital, Oslo. 
pe eng Pane a a iM Be pe crsaeh aot ype 
gy of psychotherapy. Emphasizing the importance of description 
in i ths beck begins with a discussion of the embtions of the analyst 
i and ends with an examination of special and general aspects of 


interpretation. At all points the argument is supported by ample clinical 
material, 1954. 404 pages. $6.00. 


LEARNING THEORY, PERSONALITY THEORY, end CLINICAL RESEARCH 
The published record of a symposium on the relationships among these 
three areas held at the University of Kentucky in 1953. Eleven distinguished 
contributors provide a comparison and an interesting original account of 
the most recent thinking in these fields. 1954. 164 pages. $3.50. 


The STUDY of PERSONALITY 
A Book of Readings 
With commentary by Howard Brand, University of Connecticut. A stimu- 
lating survey of original articles on theories, methods, and problems con- 


cerning personality. Research problems covered range from child develop- 
ment and isition of personality traits in early experiences to theories 


of therapy. 1954. Approx. 569 pages. Prob. $6.0. 


The FOUNDATIONS of STATISTICS 
By Leonard J. Savage, University of Chicago. The author develops, ex- 
plains, and defends an abstract theory of the behavior of a highly idealized 
arte eigenen” Concentrating on the human aspects of 
ronan 


probability as it refi in economic behavior, Savage reopens the 
question of the personalistic view and clarifies the pros and cons surrounding 
it. One of the Wiley Publications in Statistics, Walter A, Shewhart and 
Samuel S. Wilks, Editors. 1954. 294 pages. $6.09. 


MANUAL of CHILD PSYCHOLOGY, Second Edition 


Edited by Leonard Carmichael, Smithsonian Institution. Twenty-two 
authorities concentrate on the major aspects of child psychology and show 
how priate techniques have helped obtain a large body of reliable 

' up-to-date throughout, the new edition contains three com- 
pletely new chapters. “. .. the most thorough and comprehensive publica- 
tion I am familiar with in the field. . . . an indispensable source book . . .”— 
Professor F. $. Freeman, Cornell University. 1954. 1295 pages. $12.00 


Send for on-approval copies 


JOHN WHEY & SOWS, Inc.  440-4th Avenue, New York 16, WY. 


GBORGE BANTA PUBLISHING COMPAPY, MENASHA, WISCONSIN 





